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ABSTRACT 

This  paper  considers  agents  who  use  the  experiences  of  their 
neighbors  in  deciding  which  of  two  technologies  to  use.  We 
consider  two  learning  environments,  one  where  the  same  technology 
is  optimal  for  all  players  and  the  other  where  each 
technology  is  better  for  some  of  them.  In  both  environments,  we 
suppose  that  players  use  exogenously  specified  rules  of  thumb 
that  ignore  all  historical  data  but  which  may  incorporate 
a  tendency  to  use  the  more  popular  technology.  These  naive  rules 
can  lead  to  fairly  efficient  decisions  in  the  long  run,  but 
adjustment  can  be  quite  slow  when  a  superior  technology  is 
first  introduced. 

JEL  Classifications:    D8,  C7  ,  N53,  030 


1.    Introduction 

This  paper  presents  two  simple  models  of  how  economic 
agents  decide  which  of  two  technolgies  to  use  when  the  relative 
profitability  of  the  technologies  is  unknown.  In  both  models, 
agents  base  their  decisions,  at  least  in  part,  on  the  experience 
of  their  neighbors;  this  is  what  we  mean  by  "social  learning." 
We  believe  that  social  learning  is  frequently  an  important 
aspect  of  the  process  of  technology  adoption,  where  "technology" 
should  be  broadly  construed:  Although  our  main  example  concerns 
the  adoption  of  agricultural  technology  in  the  English 
agricultural  revolution,  we  believe  that  the  models  may  also  be 
applicable  to  such  choices  as  parents'  decisions  whether  to  send 
their  children  to  a  public  or  private  school. 

There  have  been  several  previous  models  of  the  role  of 
social  learning  in  technology  adoption.  Perhaps  the  earliest  is 
the  contagion  process,  which  models  adoption  as  a  random 
matching  process  in  which  players  switch  to  the  new  technology 
the  first  time  they  meet  someone  who  is  using  it;  this  process 
yields  the  familiar  "S-shaped  curve"  for  the  time  path  of 
adoption  that  has  been  widely  used  in  empirical  work. 

Recent  papers  by  Banerjee  [1991a],  [1991b],  Bikchandari  et 
al  [1991]  and  Smith  [1990]  study  more  sophisticated  models  of 
social  learning,  in  which  players  must  decide  which  of  two 
choices  is  better.  The  primary  question  of  interest  in  these 
models  is  whether  the  social  learning  is  sufficient  to  prevent 
the  population  from  locking  on  to  the  wrong  decision.  These 
papers  suppose  that  players  observe  one  another's  choices,  but 
that  players  do  not  observe  the  payoffs  that  these  choices 
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generate.    In  a  model  where  players  choose  sequentially,  later 

decision  makers  may  thus  be  led  to  make  the  same  wrong  choice  as 

their  predecessors,  as  the  late  deciders  do  not  observe  that  the 

early  ones  now  regret  their  choice.   Moreover,  this  can  occur 

even  though  the  players  could  identify  the  optimal  choice  with 

certainty  by  pooling  their  information. 

The  learning  environments  we  study  differ  from  those  of 

previous  work  in  three  ways.  First,  we  believe  that,    in  the 

context  of  technology  adoption,  it  is  more  natural  to  suppose 

that  players  do  observe  their  neighbors'  payoffs,  as  well  as 

3 
their  choices.    Second,  our  paper  differs  in   supposing  that 

individuals  periodically  observe  other  players'  outcomes  and 

reevaluate   their   own   decision.     Third,   we   consider   the 

possibility  that  players  may  be  sufficiently  heterogeneous  that 

4 
under  full  information  they  would  not  all  make  the  same  choice. 

In  the  diffusion  of  agricultural  innovations,   for  example, 

different  technologies  may  be  appropriate  for  different  soils 

and  climates. 

In   addition   to   these   differences   in   the   learning 

environment,  our  paper  also  differs  from  those  cited  in  the 

style  of  its  analysis:   Instead  of  assuming  that  the  adoption 

process  is  described  by  the  equilibrium  of  a  game  played  by 

fully  rational  agents,  we  suppose  that  players  use  exogenously 

specified,  and  quite  simple,  "rules  of  thumb."   We  have  several 

reasons  for  proceeding  in  this  fashion.   First,  in  some  of  the 

environments  we  consider,   fully  Bayesian   learning  requires 

calculations  that  may  be  too  complicated  to  be  realistic.   A 

second  motivation  for  our  approach  is  that,   to  the  extent  that 

the   technology   choice   may   be   substantially   different   than 


previous  decisions  the  players  have  faced,  we  would  be 
uncomfortable  with  the  assumption  that  the  technology  adoption 
process  is  described  by  an  equilibrium.  A  somewhat  different 
motivation  is  simply  technical  expediency:  we  did  not  see  an 
easy  way  to  incorporate  various  considerations  we  feel  are 
important  into  a  rational-actor,  equilibrium,  model. 

The  paper  is  structured  around  two  simple  models  of 
learning  environments.  The  first  one  has  a  homogeneous 
population  of  players  choosing  between  two  competing 
technologies,  with  the  payoff  to  each  technology  subject  to  an 
aggregate  i.i.d.  shock.  Each  period,  only  some  fraction  of  the 
players  has  the  opportunity  to  revise  their  choices;  these 
players  make  their  choices  using  simple  rules  of  thumb. 

Our  analysis  begins  with  a  particularly  "naive"  rule  •  of 
thumb  in  which  players  ignore  all  historical  data  and  simply 
choose  whichever  technology  worked  better  in  the  previous 
period.  This  rule  will  lead  the  popularity  of  the  two 
technologies  to  fluctuate  unless  one  of  the  technologies  has  a 
higher  payoff  for  all  values  of  the  shock. 

We  subsequently  consider  rules  which  incorporate 
"popularity  weighting,"  a  tendency  to  choose  a  more  popular 
technology  even  if  it  was  somewhat  less  profitable  last  period. 
Since  players  observe  both  the  choices  and  the  payoffs  of  their 
neighbors,  they  would  have  no  reason  to  use  popularity  weighting 
if  they  made  full  use  of  their  information.  However,  we  feel 
that  real-world  decision  makers  often  do  pay  attention  to 
popularity,  and  indeed  our  results  may  help  provide  some 
explanation  for  this  phenomenon. 

Specifically,   in   our   model   the   appropriate   use   of 


popularity  weighting  leads  players  to  adopt  and  stick  with  the 
better  technology.  Intuitively,  a  strategy  which  is  more 
popular  today  is  likely  to  have  done  well  in  the  past,  so  that 
the  relative  popularity  of  the  technologies  can  serve  as  a  proxy 
for  their  historical  performance.  Thus  is  fairly  clear  that 
popularity  weighting  rules  can  lead  to  better  decisions;  we  find 
that  one  particular  choice  of  popularity  weights  picks  out  the 
better  technology  in  the  long  run.  This  leads  us  to  ask  whether 
there  are  any  reasons  to  believe  that  this  optimal  popularity 
rule  is  either  particularly  likely  or  particularly  unlikely  to 
be  used.  In  response,  we  present  a  model  of  players  choosing 
popularity  rules  in  which  the  optimal  rule  is  the  unique 
symmetric  equilibrium. 

Our  second  model  has  a  heterogeneous  population,  with  each 
technology  better  for  some  of  the  players.  Thus  the  question 
here  is  not  whether  the  better  technology  will  be  adopted,  but 
rather  whether  the  new  technology  will  be  adopted  by  the 
appropriate  players.    We  suppose  that  there  is  a  continuum  of 

players  distributed  uniformly  over  a  line,   and  that  nearby 
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players   have   similar   payoffs   to   the   two   technologies. 

Moreover,  we  suppose  that  players  base  their  decisions  on  the 

relative  performance  of  the  two  technologies  at  locations  that 

are  within  one  "window  width"  of  their  own.   This  window  width, 

which  is  exogenous  in  our  model,   can  be  thought  of  as  either 

the  result  of  an  informational  constraint-  players  may  not 

observe  outcomes  at  far-away  locations-  or  as  the  result  of  a 

prior   belief   that   players   at   far-away   locations   are   not 

sufficiently  similar  for  their  experiences  to  be  informative. 

Once  again,  players  revise  their  technology  choices  using 


simple  rules  of  thumb.  In  particular,  we  suppose  that  players 
do  not  know  exactly  how  location  influences  relative  payoffs, 
and  thus  simply  compare  the  average  payoffs  of  the  two 
technologies  in  their  window,  as  opposed  to  using  more 
sophisticated  statistical  methods. 

This  second  model  provides  a  number  of  predications  about 
the  types  and  magnitudes  of  the  errors  that  are  likely  to  be 
made.  The  spatial  nature  of  the  process  allows  some  degree  of 
social  learning  even  without  popularity  weighting,  and  the 
long-run  state  of  the  system  is  approximately  efficient  when  the 
window  width  is  small.  However,  small  window  widths  imply  that 
the  system  converges  more  slowly,  which  can  be  costly  if  the 
initial  state  is  far  from  the  optimum.  Roughly  speaking, 
increasing  the  popularity  weighting  in  the  spatial  model  has 
about  the  same  effect  as  decreasing  the  window  width. 

Before  proceeding  further,  we  should  acknowledge  that  our 
belief  that  models  of  bounded  rationality  are  a  useful  way  to 
study  social  learning  does  not  mean  that  we  are  completely 
satisfied  with  the  particular  rules  we  consider.  In  particular, 
in  the  first  model  use  of  history  does  not  seem  so  complicated 
as  to  be  unreasonable.  Our  purpose  is  not  to  argue  that  any 
one  of  these  models  is  particularly  compelling,  but  rather  to 
identify  general  properties  that  seem  to  occur  in  some  of  the 
more  obvious  formulations.  One  recurrent  conclusion  is  that  in 
a  number  of  cases  the  long-run  state  of  the  system  is  fairly 
efficient,  even  though  the  individual  decision  rules  are  quite 
naive. 


2 .    The  English  Agricultural  Revolution 

Before  developing  our  models,  we  would  like  to  use  the 
example  of  the  English  agricultural  revolution  to  introduce  some 
of  the  issues  that  our  models  are  designed  to  address.  The 
revolution  in  question  is  the  improvement  in  English  agriculture 
between  1650  and  1850.  This  improvement  is  usually  attributed 
to  the  spread  of  new  agricultural  practices,  and  in  particular 
to  what  is  called  the  "new  husbandry,"  although  both  the  size  of 
the  improvement  and  its  causes  have  become  more  controversial  in 
recent  years.  The  new  husbandry  refers  to  a  variety  of  new 
crops  and  new  crop  rotations  which  arrived  in  England  from 
Flanders  in  the  17th  century,  based  on  the  idea  of  growing  crops 
such  as  clover  or  turnips  instead  of  leaving  the  land  fallow. 
These  crops  could  rejuvenate  the  soil,  and  the  hoeing  they 
required  had  the  by-product  of  eliminating  weeds.  The  crops 
themselves  can  be  fed  to  livestock,  which  allows  larger  herds  to 
be  carried  over  the  winter,  providing  further  supplies  of 
manure,  which  in  turn  could  be  used  for  fertilizer.   The  net 

result  is  (ideally)  increased  production  of  both  livestock  and 

7 
grains. 

The  diffusion  of  the  new  husbandry  seems  to  have  involved 

all  of  the  major  aspects  of  our  models  that  we  mentioned  in  the 

introduction.   First,  it  seems  likely  that  farmers  were  able  to 

observe,  at  least  roughly,  the  output  of  their  neighbors,  as 

well  as  their  neighbors'  choice  of  crops  and  crop  rotations. 

Second,  the  payoffs  to  various  crops  were  different  at  different 

locations,  depending  on  the  soil,  climate,  and  terrain  of  each 

farm.   There  is  not  yet  a  consensus  on  where  the  new  husbandry 

should  have  been  adopted,  but  it  is  clear  that  it  was  not  a 


universal  improvement:  Turnips  were  most  suited  to  the  light 
clay  soils  of  Norfolk,  and  were  unprofitable  in  wet  clay  soils 
like  those  of  the  Midland  Plain,  where  the  crop  had  limited 
growth,  and  was  so  difficult  to  harvest  it  was  often  left  to 
rot  in  the  field  .  Third,  a  crop  of  turnips  could  be  ruined  by 
excessive  rain  or  severe  frosts,  leading  us  to  incorporate  the 

weather  as  an  annual  stochastic  shock  complicating  the  learning 

9 
process. 

Moving  away  from  the  physical  description  of  the  situation 
to  our  (harder  to  verify)  assumptions  about  the  agents' 
behavior,  it  seems  clear  that  the  final  adoption  decisions 
resulted  from  decentralized  learning,  as  opposed  to  a 
pronouncement  from  a  central  authority  (although  there  were 
attempts  made  along  this  line,  as  we  discuss  below.)  And,  it 
seems  plausible  that  the  farmers  may  have  been  less 
sophisticated  in  their  use  of  past  observations  than  Bayesian 
learning  would  suggest.  Moreover,  with  capital  markets  poorly 
developed  or  nonexistent,  and  starvation  a  potential  concern,  it 
seems  plausible  that  farmers'  technology  decisions  were 
determined  primarily  by  short-term  considerations,  and  that 
farmers  would  be  unlikely  to  experiment  with  a  technology  with  a 
lower  expected  return. 

The  two  models  we  discuss  explore  two  different  aspects  of 
the  adoption  process.  The  homogeneous-population  model  looks  at 
a  single  location  in  isolation,  and  focuses  on  the  dynamics  of 
the  adoption  process.  It  has  been  frequently  noted  that  farmers 
as  a  group  are  seemingly  very  hesitant  to  try  new  technologies. 
These  comments  do  not  suggest  that  all  farmers  are  equally 
hesitant;  for  example,  Slicher  von  Bath  (op.     cit. ,    p.  243)  notes 


that  during  the  English  agricultural  revolution,  "Land  tilled  in 
very  ancient  ways  lay  next  to  fields  in  which  crop  rotations 
were  followed."  This  observation  fits  with  our  assumption  of 
inertia,  meaning  that  at  each  date  only  some  fraction  of  the 
population  considers  changing  technologies. 

The  apparent  inertia  in  the  diffusion  of  the  new  husbandry 
has  been  criticized  by  both  contemporary  and  modern  authors  as 
slowing  progress,  and  indeed  it  does  slow  the  transition  from  a 
dominated  technology  (one  that  is  worse  in  all  states  of  the 
world)  to  a  new  one.  However,  our  analysis  shows  that  inertia 
may  improve  the  long-run  performance  of  the  process  if  the 
performance  of  the  two  technologies  is  subject  to  sufficiently 
large  random  shocks.  Given  our  assumption  that  players  do  not 
keep  track  of  past  outcomes,  the  process  without  inertia 
oscillates  between  the  two  technologies,  while  a  combination  of 
inertia  with  a  tendency  to  use  the  most  popular  technique 
permits  the  learning  process  to  converge  to  the  better  of  the 
two.  Our  second  model  examines  the  idea  that  players  learn  from 
their  "neighbors"  when  a  new  technology  may  turn  out  to  be 
profitable  for  some  but  not  all  of  the  potential  adopters.  (In 
the  case  of  farming,  we  take  this  spatial  structure  literally, 
but  we  also  have  in  mind  learning  from  "neighbors"  who  are 
believed  to  be  similar,  but  who  need  not  be  geographically 
adjacent.)  One  of  the  most  striking  and  frequently  noted 
characteristics  of  the  agricultural  revolution  is  the  slow  rate 
at  which  the  innovations  spread.  Contemporary  observers  in 
England  said  that  the  rate  was  only  one  mile  per  year,  and 
indeed  it  did  take  more  than  a  century  for  the  new  husbandry  to 
make  its  way  across  the  island 


Many  explanations  have  been  proposed  for  this  slow  spread, 
including  technological  factors  like  pests  and  diseases, 
institutional  factors  such  as  the  enclosures,  and  the  farmers' 
lack  of  education.  While  all  of  these  factors  may  have  played  a 
role,  our  analysis  suggests  that  the  basic  fact  of  a  slow  rate 
of  diffusion  need  not  be  surprising,  as  it  is  can  be  a  natural 
consequence  of  social  learning,  particularly  when  the  difference 
in  payoffs  is  not  great,  and  when  farmers  pay  attention  to  the 
relative  popularity  of  each  technology  in  making  their 
decisions. 

A  second  and  more  spirited  debate  has  surrounded  the  role 
of  elite  landlords  and  agricultural  reformers  on  the  diffusion 
process.  The  classic  studies  of  Ernie  [1912]  and  Mantoux  [date] 
portrayed  the  agricultural  revolution  as  the  result  of  valiant 
struggle  by  innovators  such  as  Jethro  Tull  and  Lord  Townshend  to 
overcome  the  ignorance  of  the  peasant  farmers.  Revisionist 
authors  have  attacked  this  view;  a  main  component  of  their 
argument  has  been  that  the  new  techniques  were  already  in  use  in 
some  areas  of  England  before  many  of  the  so-called  innovators 
were  born. 

Our  model  reinforces  the  pro-elitist  side  of  this  debate  by 
emphasizing  that  popularizers  can  have  a  significant  impact  in 
promoting  the  diffusion  of  a  technique  even  if  they  are  not  the 
first  to  develop  it.  The  arrival  of  the  new  husbandry  in 
England  is  often  dated  to  the  publication  in  1650  of  Sir  Richard 
Weston's  observations  of  agricultural  practices  in  Brabant  and 
Flanders .  When  one  examines  the  spread  of  the  new  husbandry  as 
detailed  by  Kerridge  [op.  cit.  ,  pp.  272-279],  it  appears  that 
the  new  husbandry  spread  out  very  slowly  from  a  number  of 


geographically  distinct  locations.  The  first  adoptions  were  in 
Suffolk,  but  by  1680  the  new  husbandry  had  appeared  in  several 
other  locations  over  a  hundred  miles  away.  By  providing 
information  from  other  counties,  the  agricultural  popularizers 
may  have  promoted  such  subsequent  innovations,  and  sped  up  the 
rate  of  the  technology's  diffusion. 

In  addition,  our  theoretical  model  may  shed  some  light  on 
another  question  which  has  received  less  attention  in  the 
literature:  Was  the  new  husbandry  eventually  adopted  in  all  of 
the  areas  to  which  it  was  suited?  As  a  potential  guide  to 
historical  research,  we  discuss  the  conditions  under  which  naive 
learning  processes  of  the  kind  we  consider  tend  to  generate 
efficient  long-run  outcomes. 

Beyond  these  historical  questions,  our  spatial  model,  of 
learning  raises  the  basic  question  of  how  well  "naive"  learning 
rules  perform  that  we  investigated  in  our  first, 
homogeneous-population,  model.  Our  answers  here  focus  on  the 
tradeoff  between  rates  of  adoption  and  efficiency  of  the 
long-run  equilibrium. 

3 .     A  Simple  Model  of  Homogeneous  Populations 

Before  considering  social  learning  in  systems  with  a 
heterogeneous  population,  it  is  interesting  to  consider  the 
simpler  case  in  which  the  same  technology  is  optimal  for  all 
players.  This  model  can  be  thought  of  as  describing  behavior  at 
a  single  site  in  the  model  we  consider  later  on,  where  the 
relative  payoffs  vary  with  location.  Suppose  that  there  is  a 
large  (continuum)  population  of  players  at  a  single  site,  each 
of  whom  must  choose  whether  to  use  technology  f  or  technology  g. 
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In  each  period,  all  players  using  the  same  technology  receive 
the  same  payoff.  (Given  our  assumption  that  players  observe  one 
anothers'  payoffs,  nothing  would  be  changed  if  we  allowed  each 
player's  payoff  to  be  subject  to  idiosyncratic  shocks.)  We 
suppose  that  the  payoffs  to  the  two  technologies  at  date  t,  u. 
and  u^,  are  related  by  the  equation 


(1)   u^  -  u£  =  8  +  ct, 


where  8  is  a  fixed  but  unknown  constant  parameter  and  the  c.  are 
i.i.d.  shocks  with  zero  mean  and  cumulative  distribution 
function  H.  (In  later  sections  the  constant  9  will  vary  with 
location.)  We  will  assume  that  p  =  1  -  H(-6)  =  Prob  [u^  -  u.  £ 
0]  is  strictly  between  0  and  1. 

In  the  initial  period,  denoted  0,  a  fraction  x0  of  the 
players  are  using  technology  g.  After  each  period,  a  fraction  a 
of  the  players  have  the  opportunity  to  revise  their  choice. 
Very  low  values  of  a  might  correspond  to  a  system  in  which 
individual  player  made  their  choices  for  the  duration  of  their 
effective  lifetimes,  with  the  revisions  corresponding  to  an 
inflow  of  replacement  players;  intermediate  values  might 
describe  a  system  in  which  the  choice  of  a  technology  is 

embodies  in  a  costly  capital  good  that  will  not  be  replaced 

12 
until  it  wears  out. 

We  suppose  that  the  players  who  are  revising  their  choice 

can  observe  the  average  payoffs  of  both  technologies  in  the 

previous  period.    However,  players  do  not  have  access  to  the 

entire   history   of   payoff   observations.   To   justify   this 

assumption,  we  suppose  that  individual  players  revise  their 

choices  too  infrequently  to  want  to  keep  track  of  each  period's 

results,  and  more  strongly  that  the  market  at  this  particular 
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"location"  is  too  small  for  a  record-keeping  agency  to  provide 
this  service. 

The  simplest  behavior  rule  we  consider  is  the  "unweighted" 
rule  under  which  all  players  who  revise  their  choice  pick  the 
technology  which  did  best  in  the  preceding  period.  Under  this 
adjustment  rule,  the  evolution  of  the  system  is  described  by 

f(l-a)x.+  a  with  probability  p  =  Prob[u^£  u.  ], 
(2)  x  =   \ 

t+1    l(l-ct)xt     with  probability (l-p)=Prob[u^  <  u£]  , 

so  that 

(2')      E(xt+llxt)  =  (1-a)xt  +  ap- 

Note  that  this  specification  is  symmetric  in  its  treatment 

of  the  adoption  and  discontinuance  decisions,  which  corresponds 

13 
to  the  case  where  the  costs  of  "transition"  are  small.     Our 

reading  about  the  English  agricultural  revolution,  as  well  as 

studies  of  more  recent  innovations  cited  in  Rogers  and  Shoemaker 

(p. 115)   suggest   that   the   amount   of   discontinuance   is   an 

important  factor  in  the  diffusion  process. 

The  following  result  is  standard;  it  follows  from  e.g. 

theorem  10  of  Norman  [1968].   (It  is  also  a  consequence  of  part 

(b)  of  proposition  2  below.) 

Proposition  1 :  The  system  (2)  is  ergodic,  i.e.  the  time-average 
of  x.  converges  to  its  expectation  with  respect  to  its  unique 
invariant  measure  u.  Moreover,  E  (x)  =  p,  and  var  (x)  = 
p(l-p)a/(2-a) . 

4 .    A  Single  Location  with  Popularity  Weighting 

Proposition  1  says  that  observing  the  long-run  fraction  of 
players  using  technology  g  reveals  the  fraction  of  the  time  that 
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g  has  been  the  better  choice.  If  the  distribution  H  of  c  is 
symmetric,  the  arm  that  is  more  often  better  is  also  the  arm 
with  the  higher  expected  payoff.  (The  same  conclusion  holds  so 
long  as  the  amount  of  asymmetry  in  H  is  small  compared  to  e.) 
This  suggests  that  if  all  other  players  in  the  population  are 
choosing  whichever  technology  has  the  highest  current  score, 
each  player  could  gain  by  considering  the  relative  popularity  of 
the  two  technologies,  as  well  as  their  recent  payoffs. 

Intuitively,  the  current  popularity  provides  some 
information  about  the  past  history  of  the  process,  and  thus  can 
serve  as  a  proxy  for  it.  Although  this  information  is  not 
complete,  the  complete  history  of  the  process  is  not  needed  to 
identify  the  better  technology.  One  way  to  interpret  the 
results  of  this  section  is  that  in  some  cases  popularity 
weighting  is  a  good  enough  proxy  for  the  history  that  the  system 
eventually  converges  to  the  correct  choice. 

We  now  develop  a  simple  parametric  model  of  popularity 
weighting  with  a  single  location.  As  above,  we  suppose  that 
only  a  fraction  a  of  the  population  updates  its  choice  each 
period.  Now,  though,  instead  of  choosing  the  technology  which 
did  best  last  period,  the  choice  rule  is 

(3)   "Choose  g  if  u^  -  u^  i  m(l  -  2xfc) . " 

Under  this  rule,  the  probability  that  those  players  who  revise 
their  choices  choose  g  is  Prob[  0+c.  2  m(l  -  2x.  )  ]  =  1  - 
H(m(l-2x. ) -9) ;  when  all . players  use  rule  (3),  the  fraction  using 
g  evolves  according  to 

...  I  (l-a)x.  +a  with  probability  l-H(m(l-2x. ) -6) 

K*)      xt+i  "1       c  r 

[  (l-a)x.     with  probability  H(m(l-2xt) -G) 
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The  parameter  m  indexes  the  amount  of  popularity  weighting; 
the  case  m  =0  corresponds  to  the  unweighted  case  discussed 
above.  When  x.  =  1/2,  both  technologies  are  equally  popular;  in 
this  case  players  chooses  the  technology  with  the  highest 
current  payoff  for  any  value  of  m.   As  m  grows,  players  become 

more  willing  to  choose  the  currently  popular  technology  even  if 

14 
its  current  payoff  is  lower 

We  use  the  linear  specification  of  popularity  weighting 

primarily  for  analytic  convenience.  It  combines  nicely  with  a 

second  simplifying  assumption  that  we  make  in  this  section,  that 

the  distribution  H  of  the  per-period  shocks  c.   is  uniform  on 

[-cr,     <j]  .    This  allows  us  to  explicitly  compute  the  long-run 

behavior  of  the  system  for  any  m.    It  also  ensures  that  the 

linear  class  of  weighting  rules  we  consider  includes  one  rule 

that  leads  the  asymptotic  distribution  to  concentrate  on  the 

15 
optimal  choice,  namely  m  =  cr. 

Beyond  the  presumed  linearity  in  x.  ,  another  point  to  note 

about  decision  rule  (3)  is  that  it,  and  any  rule  that  compares 

ft     "f 

the  difference  u.  -u.  to  a  function  of  x.  ,  is  invariant  to 
additive  transformations  of  the  payoff  function,  but  not  to 
multiplicative  ones:  In  order  to  preserve  the  same  decision 
rule  when  the  payoff  functions  are  multiplied  by  a  constant  A, 
the  parameter  m  must  be  multiplied  by  the  same  constant.  One 
way  to  see  why  this  must  be  the  case  is  to  note  that  the 
expression  (l-2x)  is  unitless,  so  the  parameter  m  is  measured  in 
the  same  units  as  the  payoff  are. 

Our  assumption  that  the  per-period  shocks  have  a  uniform 
distribution  makes  it  easy  to  determine  the  long-run  behavior  of 
the  system.  Since  the  lowest  possible  value  of  c.   is  -cr,  the 
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lowest  possible  observation  of  u^  -u.  is  6  -a.  Hence,  if  x.  is 
sufficiently  large  that  e  -a  i  m(l-2x.)  ,  or  equivalently  if  x. 
2  x^  =  (m-0+cr) /2m,  the  fraction  using  technology  g  is  certain 
to  increase.  Likewise,  if  x.  s  x  =  (m-e-<x)/2m,  the  fraction 
playing  f  is  certain  to  increase.  (Note  that  cr  >  o  implies  x  < 
x^.)  Because  the  probability  of  a  upwards  step  is  minimized  at 
x.  =  0,  this  probability  must  be  at  least  Prob[8+e,  s  m]  = 
(cr-m+e)/2cr  =  -(m/cr)x  .  Thus,  when  x  <  0,  so  that  the  system 
cannot  "lock  on"  to  downwards  steps,  the  probability  of  an 
upwards  step  is  uniformly  bounded  away  from  zero.  Similarly,  if 
x^  >  1,  the  probability  of  a  downwards  step  is  uniformly  bounded 
away  from  0 . 

The  above  shows  that  (ignoring  knife-edge  cases)  there  are 
four  possibilities  for  the  long-run  behavior  of  the  system:'  If 
x"  <  1  and  x  <0,  the  system  is  certain  to  eventually  make 
enough  upward  jumps  that  x.  >  xg,  so  that  from  any  initial 
position  the  system  converges  with  probability  1  to  x.  =  1.  If 
x"  >  1  and  x  >  0 ,  the  system  converges  to  x .  =  0  from  any 
initial  position.  If  0  <  x  and  xg  <  1,  the  system  will 
converge  (with  probability  1)  to  0  if  xQ  £  x  ,  will  converge  to 
1  if  x.  i  x  ,•  for  x_  e  (x  ,  x")  ,  the  system  will  also  eventually 
converge  to  a  steady  state,  but  it  has  a  positive  probability  of 
ending  up  at  each  of  the  two  steady  states  of  the  system.  In 
the  remaining  case,  in  which  x  <  0  and  x^  >  1,  the  system  will 
not  converge  to  either  steady  state.  Instead,  the  fraction  x. 
will  continue  to  fluctuate,  with  the  long-run  distribution 
computed  following  the  statement  of  proposition  2  below. . 

The  above  observations  do  most  of  the  work  required  to 
establish  the  following  claims: 
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Proposition  2 : 

(a)  Popularity  weighting  m  =  a  is  "optimal"  in  the  sense 
that  from  any  x0  the  system  converges  with  probability  1  to  the 
state  where  everyone  uses  the  better  technology. 

(b)  m  >  cr  is  "overweighting",  in  that  the  system  converges 
with  probability  1  to  a  steady  state,  but  which  steady  state  is 
selected  may  depend  on  the  initial  condition  x_.  More  precisely, 
the  system  converges  to  the  better  technology  if  |e|  2  m-cr, 
while  for  |0|  <  m-cr  the  behavior  of  the  system  depends  on  the 
initial  condition  x_ .  If  xQ  2  (m+cr-0)/2m,  the  system  converges 
to  1  with  probability  1;  if  x  *  (m  -cr-0)  /2m,  the  system 
converges  to  0  with  probability  1.   If  |0|<  m-cr  and 

x  e(  (m-cr-0) /2m,  (m+cr-0)  /2m)  ,  the  system  will  eventually 
converge  to  one  of  the  steady  states ,  but  both  steady  states 
have  positive  probability. 

(c)  With  "underweighting, "  i.e.  m  <  cr,  the  system  need  not 
converge  to  a  steady  state.  It  does  converge  (with  probability 
1)  to  the  better  technology  if  |0|  2  cr-m,  but  for  |0|  <  cr-m,  the 
system  has  a  non-degenerate  invariant  distribution  u,    with 

E  x  =  1/2  +  0/2  (cr-m),  and 

var  x  =  acrE  xE  (1-x)  /  [  (2-a)cr-2  (l-a)m]  . 

Proof: 

(a)  If  m=  cr,  then  xg  =  (2m-0) /2m  is  less  than  1  iff  ©  >  0,  and 
x  =  -0/ 2m  is  greater  than  zero  iff  0  <  0.  The  conclusion  now 
follows  from  the  argument  in  the  text. 

(b)  It  suffices  to  check  that  if  0  >  m-cr  >  0  then  then  x  <  0 
and  xg  <  1,  that  -0  >  m-cr  >  0  implies  x   >  0  and  xg  >  1,  while 
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for  m-cr  >  |e|,  x   >  0  and  x"  <  1 . 

(c)  A  similar  computation  shows  that  when  |0|  >  cr-m,  the  system 
must  converge.  Appendix  B  establishes  that  the  system  has  a 
unique  invariant  distribution  when  cr-m  >|©|,  and  computes  the 
corresponding  mean  and  variance.  ■ 

Proposition  2  shows  that  the  system  is  certain  to  converge 
to  the  correct  choice  if  the  popularity  weight  m  equals  a,  and 
that  the  payoff  loss  from  a  wrong  choice  must  be  small  if  m  is 
close  to  this  level.  Thus  it  is  interesting  to  ask  whether 
there  is  any  particular  reason  to  suppose  that  popularity 
weights  equal  or  close  to  a  are  likely  to  be  used,  or  conversely 
whether  there  are  forces  in  the  model  that  would  drive  the 
players  to  use  different  weights.  As  a  partial  response,  we 
consider  a  game  in  which  players  simultaneously  choose  their 
individual  popularity  weights,  and  show  that  the  optimal  weight 
m  =  cr  is  its  unique  equilibrium  outcome.  This  result  is  only  a 
partial  response,  because  it  supposes  more  sophistication  in  the 
determination  of  the  popularity  weights  than  we  find  compelling. 
However,  the  result  does  show  that  popularity  weighting  need  not 
conflict  with  individual  incentives. 

To  define  the  payoffs  in  this  game,  we  suppose  that  players 
have  a  common  prior  distribution  p  over  e,  and  that  p  assigns 
positive  probability  to  every  neighborhood  of  9  =  0.  For  each 
xQ  e  [0,1],  6  e  support (p),  and  m,  let  u(0,  xQ,  m)  be  the 
long-run  distribution  on  x  when  all  players  use  weighting  m. 
For  each  value  of  B,  the  payoff  to  the  profile  m  is  defined  to 
be  the  mean  of  the  long-distribution  on  payoffs,  and  the  overall 
payoff  is  the  expectation  of  this  value  with  respect  to  the 
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prior  beliefs  p.  Our  use  of  the  long-run  payoff  criterion  here 
is  made  solely  for  convenience:  we  would  prefer  to  consider 
optimal  behavior  for  discount  factors  near  0,  but  then  we  would 
want  to  allow  for  the  popularity  weightings  to  be  chosen 
repeatedly  (popularity  has  no  relevance  in  the  first  period)  and 
would  need  to  consider  the  distribution  of  the  state  in  each 
period  separately. 

Since  the  profile  in  which  all  players  choose  m  =  a  results 
in  the  full-information  payoffs,  this  profile  is  clearly  a  Nash 
equilibrium  of  the  game.  Moreover,  it  is  the  only  symmetric 
equilibrium,  as  shown  in  the  following  proposition. 

Proposition  3 :  If  every  neighborhood  of  0  =  0  has  positive 
probability,  the  unique  symmetric  equilibrium  of  the  game  in 
which  players  simultaneously  choose  the  weight  m  they  give  to 
popularity  is  for  all  players  to  choose  m  =  a. 

Remarks:  (1)  We  do  not  know  whether  there  are  asymmetric 
equilibria  as  well.  The  long-run  behavior  of  the  system  when  two 
or  more  decision  rules  are  used  by  a  non-negligible  proportion 
of  the  population  seems  difficult  to  determine.  We  should  also 
point  out  that  no  symmetric  pure-strategy  Nash  equilibrium 
exists  in  the  case  of  a  normal  distribution  that  we  consider  in 
appendix  A.  This  should  not  be  too  surprising,  since  in  that 
case  popularity  weighting  permits  the  full-information  payoff 
to  be  approximated,  but  not  to  be  attained  exactly.  However, 
profiles  where  all  players  use  a  "large"  amount  of  popularity 
weighting  are  e-Nash  equilibria. 

(2)  If  players  are  certain  that  the  absolute  value  of  6  is 
bounded  away  from  zero,  there  are  equilibria  in  which  m  can 
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exceed  8. 

(3)  It  is  difficult  to  suppose  that  players  using  the  kinds  of 
naive  learning  rules  we  consider  would  consciously  choose  their 
popularity  weights  to  maximize  their  long-run  payoff.  We  prefer 
to  interpret  the  eguilibrium  assumption  here  as  the  result  of  a 
long-run  adaptive  process,  but  this  raises  the  question  of  the 
relative  speeds  of  adjustment  of  the  process  determining  m  and 
that  reflecting  learning  about  the  technologies. 

Proof:  Fix  any  profile  where  all  players  use  some  m  *  cr.  The 
idea  of  the  proof  is  simply  that  if  m  <  <x,  so  everyone  is  using 
too  little  popularity  weighting,  then  each  individual  player 
would  prefer  to  deviate  and  give  more  weight  to  popularity, 
while  if  m  >  <r,  each  player  would  prefer  to  give  popularity  use 
a  bit  less  weight. 

Since  there  is  a  "large  number"  of  players,  the  aggregate 
behavior  of  the  system  is  unaffected  if  any  single  player 
deviates.  We  will  show  that  there  is  always  a  deviation  that 
improves  the  player's  payoff  when  |8|  is  sufficiently  small,  and 
has  no  effect  on  the  player's  payoff  when  |e|  is  larger;  the 
conclusion  will  then  follow  from  our  assumption  that  every 
neighborhood  of  8  =  0  has  positive  probability. 

(a)  Suppose  first  that  m  <  a,  and  consider  a  player 
deviating  to  m'  =  m+dm  for  some  small  dm  >  0.  This  deviation  has 
no  effect  on  his  long-run  payoff  if  |e|  £  <r-m,  for  in  this  case 
the  payoff  difference  between  the  two  technologies  is 
sufficiently  strong  that  x.  converges  to  the  optimal  choice,  and 
any  m'  ^  m  yields  the  first-best  long-run  payoff. 

If  |e|  <  cr-m,  the  system  does  not  converge  to  a  steady 
state,  but  as  a  non-degenerate  ergodic  distribution.    In  this 
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case,  deviating  to  m'  leads  the  player  to  use  g  instead  of  f 

whenever 

g    f 
i3  —  n 


m 


' (l-2xt)  s  u£  -  ufc  <  m(l-2xt),  or 


m' (l-2x  )  -0  £  c.  <  m(l-2x  )-e. 
Similarly,  the  player  will  now  use  f  instead  of  g  whenever 

m(l-2x.)  -  c.  <  m' (l-2x.) -8. 
Since  the  difference  in  payoff  between  g  and  f  is  e,  the 
expected  change  in  the  player's  per-period  payoff  is 

m(l-2x)-e]d(i(x)  , 

where  H  is  the  uniform  distribution,  and  u.     is  the  invariant 
distribution  on  that  was  derived  in  proposition  2.  Since  m(l-2x) 


du . / dm  =  9 


(2x-l)dH 


-9   €  [-cr,cr]    for  all  x  e  [0,1],   dH 


m(l-2x)-e  =  l/2cr,  and  so 


du.  /dm  =  9/2cr 


(2x-l)du(x)  =  e/2cr  [E  2x-l] 


Since  Ex  >  1/2  for  e  >  0,  and  Ex  <  1/2  for  e  <  0,  du./dm  >  0 
for  all  9   e  (-(cr-m),0)  u  (0,  cr-m)  . 

(b)  Now  consider  a  profile  in  which  players  use  an  m  >  a,  and 
consider  an  individual  player  deviating  to  m'  =  cr.  Suppose  first 
that  9  >  0,  so  that  technology  g  has  the  higher  payoff.  If  e  2 
m-cr,  the  system  converges  to  1  with  probability  1,  so  that  both 
m  and  m'  yield  the  full-information  long-run  payoff;  the  same  is 
true  if  9  £  m-cr   and  xQ  £  x"  =  (m+cr-e)  /2m. 

If  0<  9  <  m-o-  and  xQ  <  x^,  or  equivalently  9  <  (l-2xQ)m+cr, 
there  is  positive  probability  that  x.  converges  to  0.  In 
particular,  if  e  <  cr,  there  is  positive  probability  of 
converging  to  0  form  any  x.  £  1/2.  If  the  system  does  converge 
to  0,  using  weighting  m  leads  to  f  being  played  in  almost  every 
period,  while  using  m'  =  cr ,  the  probability  of  playing  g 
converges  to  the  probability  that  c.  2  m-e  =  (cr+m+e)/2m,  which 
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is  greater  than  0  for  all  positive  9. 

The  preceding  two  paragraphs  show  that  m'  does  at  least  as 
well  as  m  for  all  positive  9,  and  does  strictly  better  if  xn  s 
1/2  and  0<  9<  min(cr,  m-cr).  A  symmetric  argument  shows  that  m' 
does  at  least  as  well  as  m  for  all  negative  9,  and  does  strictly 
better  if  ©*  +  z  x  £  1/2  and  0>  9>  max(-cr,  -(m-cr)).  Hence  if 
the  prior  assigns  positive  probability  to  the  neighborhood  of  9 
=  0,  m'  yields  a  strictly  higher  expected  payoff  from  any 
initial  position  xQ.  ■ 

While  our  formal  results  concern  the  eventual  steady  state 
of  the  system,  the  speed  of  convergence  is  of  some  interest  as 
well.  In  particular,  consider  an  initial  position  where  x.  is 
small,  so  that  g  corresponds  to  a  "new"  technology,  and  suppose 
that  9  >  0,  so  that  the  new  technology  is  in  fact  an 
improvement.   Then  the  share  of  technology  g  increases  whenever 

9+c.      >     m(l-2x.  )  ,   and  since  the  probability  of  this  event 

17 
increases  with  9,    so  does  the  expected  rate  of  adoption.     Such 

a  correlation  between  the  extent  of  improvement  and  the  speed  of 

adoption  has  been  noted  in  the  empirical  discussions  of  in 

Mansfield  [1968]  and  Rogers  and  Shoemaker   [1971],  but  has  not, 

18 
so  far  as  we  know,  been  addressed  in  the  learning  literature. 

Note  also  that  for  fixed  9,      the  speed  of  convergence 

decreases,  as  a     increases,  so  that  each  period's  observation 

becomes  less  informative.   More  generally,  convergence  will  be 

slow  if  the  new  technology  usually  does  about  as  well  as  the  old 

one,  but  occasionally  does  much  better.   Furthermore,  if  the  new 

technology  usually  does  slightly  worse  than  the  old  one,  but 

occasionally  does  much  better,  (i.e.  if  the  new  technology  has  a 

higher  mean  payoff  but  a  lower  median)  then  naive  learning  rules 
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that  look  only  at  the  recent  relative  performance  will  be  biased 
towards  the  wrong  choice.  This  is  consistent  with  the 
observation  that  seat  belts,  insurance,  and  vaccinations  have 
been  slow  to  diffuse. 

5.    Heterogeneous  Population  with  Linear  Technologies 

Now  we  turn  to  the  study  of  heterogeneous  populations,  in 
which  different  technologies  may  be  optimal  for  different 
individuals.  As  before,  we  suppose  that  there  are  only  two 
technologies,  denoted  f  and  g,  with  the  mean  difference  in 
payoffs,  E(u^-u.  )  ,  equal  to  9.  Now  though,  we  think  of  9  as 
representing  a  location  along  a  line,  so  that  players  at 
different  locations  have  different  9's.  In  particular,  the 
optimal  rule  (both  socially  and  privately)  is  for  players  with 
positive  9  to  use  g,  and  players  with  negative  9  to  use  f,  so 
that  the  distribution  of  technology  choice  has  a  cut-off  or 
break-point  at  9  =   0. 

It  will  be  important  in  the  following  that  the  relative 
advantage  of  using  technology  g  at  location  9  may  be  correlated 
with  the  "absolute  advantage"  of  location  9,  e.g.  the 
productivity  of  the  "land."  To  capture  this,  we  suppose  that  the 
payoffs  to  the  technologies  have  the  following  linear  form: 

'u^(9)=9  +  /39  +clt 


(5) 


u£(9)=/39  +e2t< 


With  this  parameterization,  /3  >  0  implies  that  technology  g 
does  better  at  "good"  locations,  while  when  /3  <  0,  g  does  better 
at  bad  ones. 

In  this  model,  the  player's  location  in  8-space  determines 
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his  average  payoff  to  the  two  technologies.  We  want  to  think  of 
the  payoff-relevant  variables  as  being  unobservable  but 
correlated  with  the  observed  locations.  The  idea  is  that 
players  do  not  know  exactly  which  aspects  of  their  locations  are 
payoff -relevant,  or  how  these  aspects  influence  their  payoffs. 
For  this  reason,  we  do  not  allow  the  players  to  regress  the 
observed  payoffs  of  each  technology  on  the  corresponding  values 
of  9.  Instead,  we  suppose  that  players  base  their  decisions  on 
the  average  performance  of  the  two  technologies  at  locations  in 
their  "observation  windows,"  where  the  observation  window  of  the 
player  at  0  is  the  interval  [0-w,  e+w]  .  We  call  w  the  "window 
width." 

We  have  two  interpretations  in  mind  for  this  model.  First, 
the  location  parameter  6  may  correspond  to  geographical 
location,  with  the  performance  of  the  technologies  linked  to 
variables  such  as  climate  or  terrain  that  are  in  turn  correlated 
with  location.  Second,  the  model  may  describe  adoption 
decisions  at  a  single  village,  where  players  are  differentiated 
by  idiosyncratic  payoff-relevant  characteristics  such  as  wealth 
and  household  size. 

In  studying  geographic  diffusion,  for  example  of  an 
agricultural  technology,  the  observation  window  might  reflect 
the  farmer  only  observing  the  outputs  of  his  neighbors,  and  the 
window  width  w  might  be  fairly  small.  In  studying  adoption  at  a 
single  site,  the  observation  window  corresponds  to  the  players' 
beliefs  about  which  other  players  are  sufficiently  similar  for 
their  experiences  to  be  relevant,  and  players  might  well  observe 
the  actions  and  outcome  of  others  who  are  outside  of  their 
window.  To  the  extent  that  the  relevant  characteristics  are 
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difficult,  to  determine,  the  window  widths  in  this  interpretation 
might  be  fairly  large.  In  both  interpretations,  players  might 
prefer  to  weight  observations  of  their  immediate  neighbors  more 
heavily  than  those  of  players  who  are  farther  away,  but  still 
within  the  observation  window;  this  may  be  particularly 
attractive  when  the  observation  window  is  large.  As  in  the 
study  of  a  homogeneous  population,  we  begin  by  analyzing  the 
simple  rule  where  players  use  whichever  technology  did  better  in 
their  window  last  period;  later  we  will  enrich  the  model  to 
allow  for  popularity  weighting.  To  define  this  rule  formally, 
suppose  that  the  distribution  of  players  over  locations  has  a 
constant  density,  which  we  normalize  to  equal  1,  and  let  u^(8) 
be  the  average  score  realized  by  those  players  in  the  interval 
[S-w,  0+w]  who  used  g  at  period  t,  with  the  convention  that 
u?(0)  =  -co  if  every  player  in  the  interval  used  f;  the  average 
u.  (9)    is  defined  analogously. 

The  (unweighted)  decision  rule  is  for  the  player  at  ©  is 
then 

(6)   "Play  g  at  period  t+1  iff  u^(9)  -u£(e)  s  0.  " 

In  the  previous  sections  we  considered  a  model  with  a 
continuum  of  players  and  inertia,  so  that  the  fraction  of 
players  using  each  strategy  can  never  shrink  all  the  way  to  zero 
in  finite  time.  In  our  study  of  spatial  models,  though,  we  will 
suppose  that  there  is  no  inertia  at  individual  locations,  so 
that  all  players  at  each  location  revise  their  choices  each 
period.  We  do  so  in  part  for  reasons  of  convenience,  and  in 
part  because  in  rural  areas  with  low  population  density  it  seems 
plausible  that  a  technology  could  be  abandoned  by  everyone  in  an 
observation  window  after  a  few  bad  draws  in  a  row. 
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As  a  first  step  in  analyzing  the  decision  rule  (6) ,  suppose 
that  the  noise  terms  c  .  and  c_.  are  identically  zero,  so  that 
the  system  is  deterministic.  Suppose  further  that  the  initial 
state  of  the  system  is  described  by  a  cut-off  rule.  That  is, 
suppose  that  there  is  a  9,  such  that  all  players  with  9  s  e 
choose  g  and  all  those  with  9  <  6.  choose  f.  Then  the  period 
t+1  state  will  be  described  by  a  cut-off  rule  as  well.  To  see 
this,  note  that  all  players  at  9  >  9 .+  w  see  only  g  being 
played,  and  hence  will  play  g  in  the  next  period,  while  all 
players  at  9  <  9.-  w  play  f.  Players  at  every  9  e  [0.-w,  9.+w] 
see  both  f  and  g  being  played,  with 


(7) 


-g         r+w 

u^(0)  =  L   (0+l)sds/(e+w-0)  =  (/3+1)  (9+w+e)/2,  and 

Q 

u£(0)  =  |   £sds/(e+w-e)=/3(e-w+e)/2. 


Thus  for  9.  -w  <  9'    <   9"   <   ©t+w,  we  have 
u^(0")  -  u£(0")  =  u^(0')  -  u£(0' )  +  (0"-0')/2, 
so  that  if  the  player  at  9'    plays  g  in  period  t+1  then  so  does 
the  player  at  9".      Hence  the  state  at  period  t+1  is  described  by 
a  cut-off  rule. 

A  steady  state  cut-off  rule  must  have  the  property  that  the 
player  at  the  steady-  state  cut-off  is  indifferent  between  f 
and  g  given  his  observations.  Thus  the  steady  state  is  the 
unique  solution  of 


u^(0*)  =  u£(9*),  that  (0+1) (0*+  w/2)  =  0(8*-  w/2), 


and  so 


(8)      6*  =  -(20+l)w/2. 
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Note  that  although  the  optimal  cut-off  is  9  =  0  for  any 
value  of  /3 ,  the  steady  state  cut-off  is  only  at  0  if  /3  =  -1/2. 
When  |3  =  0,  for  example,  the  payoff  to  f  is  identically  zero, 
while  the  payoff  to  g  is  equal  to  9.  Hence  when  the  cut-off  is 
at  0,  the  average  payoff  to  players  using  g  is  strictly 
positive,  which  will  tempt  players  to  the  left  of  0  to  adopt  g 
as  well.  The  discrepancy  between  the  steady  state  and  the 
optimum  arises  from  our  assumption  that  players  do  not  directly 
observe  9,  and  hence  use  only  the  average  payoffs  received  by 
the  two  technologies  in  making  their  decisions. Note  that  the 
maximum  steady-state  payoff  loss  at  any  location  is  the  absolute 
value  of  9  ,  which  is  small  if  /3  is  not  too  large  (in  absolute 
value)  and  the  window  width  w  is  small. 

Having  determined  the  steady-  state  cut-off,  we  next 
examine  the  behavior  of  the  system  away  from  the  steady  state. 

It  is  easy  to  show  that,  from  an  initial  cut-off  9Q,    the  cut-off 

* 
will  move  towards  the  steady  state  9      at  a  distance  of  w  each 

period  until  it  is  within  w/2  of  9    .  Once  9.      is  within  this 

interval  the  system  typically  enters  a  stable  2 -period  cycle 

* 
about  9    .    For  ease  of  reference,  we  summarize  this  as  a 

proposition. 

Proposition  4 :  From  an  initial  cut-off  eQ,  the  system  determined 
by  (6)  and  (7)  evolves  according  to 


0.  +w       9.<9    -w/2. 


(9)  e    = 


* 


-©t  +26      ©te[9  -w/2, 6  +w/2) 


9.    -w     e.se  +w/2. 


—a  *        — f  * 
Proof:   If  u^(S.-w)  -  u.  (0.-w)>  0,  then  all  players  who  observe 

both  technologies  being  played  -i.e.  all  players  in  the  interval 
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[8. -w,8. +w]  -  use  g  in  period  t+1.   Substituting  8  =    8  -w  into 
eguation    (7),  we  see  that  this  is  the  case  if   (/3  +  l)8.   ^ 

vs  A  .£  A  JL 

/3(8  -w),  or  8.  ^  -/3w  =  8  +  w/2.  Similarly,  if  e.  <  6  -w/2,  all 
players  who  see  both  technologies  being  played  choose  f  in 
period  t+1.  Finally,  if  8fc  e  [8  -w/2,  8  +w/2),  ©t+1  will 
atisfy  O+l)  (et+1+0t  +  w)  =  ^(©t+1  +  ©t  -  w)  ,   so  that 


s 


* 


et+1  =  -et  -(2/s+i)w  =  -et  +  28  .» 

Next  we  consider  the  behavior  of  the  model  with  noise,  i.e. 
with  c..  and  c_.  non-degenerate  i.i.d.  random  variables.  Let  z. 
=  e_.  -  e1t  denote  the  difference  in  the  two  shocks,  and  let  8. 
=  8  +z.  ;  et  i-s  *-he  steady  state  of  the  system  when  e_  -e.  is 
identically  equal  to  z.  for  all  r.  Because  behavior  rule  (6) 
depends  only  on  the  difference  between  the  payoffs  to  f  and  g, 
and  not  on  their  levels,  the  evolution  of  the  system  from  0. 

when  the  shock  is  z,  is  the  same  as  that  given  in  equation  (9) , 

*  * 

with  the  term  8   replaced  everywhere  by  8.  . 

Proposition  5:   If  the  period-t  cut-off  is  8.  ,  and  the  period-t 
shock  is  z.  ,  the  period  t+1  cut-off  is  given  by 

8.  +W       8  <8  -W/2. 
-8.  +28.     0te[e.-w/2,8  +w/2) 

A  A  JL 

8   -w       8  £8.+w/2. 


(10)   8t+1   = 


Proof:  For  locations  8  €  [8.-w,8.+w],  the  difference  between 
the  average  payoffs  of  the  two  technologies  in  8's  observation 
window  (the  interval  [9.-w,  B.+w] ) ,    i.e.  u£(8,e  .)  -  u.(8,c  .), 

is   [8  +  8fc  +(2/3+l)w]/2  -zt=  (8  +8.  -28.) /2.   Since  6t    >    6.     + 

•*■      *  ^      a      * 

w/2  implies  S+e.    *   28.  for  all  8  £  8  -w,  et  >  et  +  w/2   implies 

that  all  players  who  observe  both  technologies   choose  g. 
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Similarly,  9,     <    8.     +    w/2  implies  that  all  players  who  see  both 

*       * 
technologies  choose  f.    Finally,  if  8^    e  [6  -w/2 ,8 ,+w/2) ,  the 

period-t  cut-off  is  given  by   et+1  =   -et  +28  .       ■ 

Proposition  6 :  When  the  z.  are  i.i.d.  draws  from  a 
distribution  that  has  a  strictly  positive  density  on  a  compact 
support,  the  dynamic  process  generated  by  (10)  has  a  unigue 
invariant  distribution  F,  and  the  expected  probability 
distribution  at  date  t  converges  to  F  uniformly  over  initial 
probability  distributions  p.. 

Proof:  Appendix  C  shows  that  the  system  is  a  random 
contraction  in  the  sense  of  Norman  [1972]  and  satisfies 
unigueness  condition  2.11  of  Futia  [1982]. 

We  have  not  been  able  to  characterize  this  distribution 
directly.  Instead,  we  have  computed  an  invariant  distribution 
of  the  simpler  system  generated  by 


(ID  et+1  = 


r  A        A        JL 

0t+W0t~et 

®t"wVet 


Note  that  system  (11)  differs  from  (10)  only  when  8.     falls  in  an 

interval  of  width  w.  Normally  we  will  think  of  the  variance  of 

z.  as  being  much  larger  than  the  window  width;  in  this  case  it 

may  be  reasonable  to  guess  that  the  invariant  distributions  of 

(10)  and  (11)  are  close  together. 

We  should  point  out  that  the  simplified  system  (11) ,  unlike 

(10),  does  not  have  a  unigue  invariant  distribution:  Because  all 

steps  have  size  w,  from  initial  position  8  ,    the  support  of  (11) 

is  concentrated  on  the  grid  8     +    kw,  and  so  different  initial 

^     o 

conditions  lead  to  different  invariant  distributions.   Moreover, 
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the  supports  for  of  the  date-t  distribution  are  different  for  t 
even  and  for  t  odd.  Despite  these  qualitative  differences 
between  systems  (10)  and  (11),  the  absolute  magnitude  of  the 
effect  of  the  initial  condition  is  small  when  w  is  small,  which 
supports  the  conjecture  that  the  two  systems  are  similar.  Table 
1  below  provides  further  support  for  this  belief  by  comparing 
Monte  Carlo  estimates  of  the  steady-state  variance  of  (10)  with 
the  variance  of  the  particular  invariant  distribution  of  (11) 
that  is  computed  in  proposition  7.  As  conjectured,  the  two 
variances  are  close  when  w  is  small. 

To  examine  the  invariant  distributions  of   (11) ,   suppose 
that  the  noise  terms  z.  are  are  i.i.d  with  mean  0  and  c.d.f.  H. 

A  A 

Then  9.     follows  a  Markov  process  with  the  transition  from  9.     to 

A  JL  A  A  JL. 

©.  +  w  having  probability  Prob[0  +z.  ^  0.  ]  =  1  -  H(©. -9  ) .  The 
invariant  distribution  has  a  particularly  simple  form  when  the 
z.  are  uniform  on  [-cr,  cr]  and  the  parameters  are  such  that  there 

is  an  invariant  distribution  whose  support  is  a  symmetric  grid 

*        * 

containing  the  points  9   -cr  and  9  +  <r . 

Proposition  7:   Suppose  the  z.  are  uniform  on  [-cr,  cr]  ,  and  that 

M  =  cr/w  is  an  integer.    Then  one  invariant  distribution  of  (11) 

*  -2M 

is  the  binomial  Prob(0  =  9   +kw)  =  [  ( (2M! )  /  (M-k)  !  (M+k)  !  ]  2     ; 

this  is  the  limit  of  the  time-average  distribution  when  the 

* 

initial  condition  belongs  to  the  grid  9   ±  kw,  k  ^  M. 

* 
Remark:   Recall  that  the  mean  of  this  distribution  is  9    ,  its 

variance  is  crw/2,  and  that  the  distribution  is  asymptotically 

normal  as  w  tends  to.  zero. 

Proof:    To  show  that  f  is  an  invariant  distribution,  it  is 
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sufficient  to  verify  that  it  meets  the  "detailed  balance 
condition"  that  for  all  6  and  6'  ,  the  (unconditional) 
probability  flow  from  9  to  6'  equals  the  probability  flow  in  the 
reverse  direction.   Thus,  we  will  verify  that 

f(e)  Prob(et+1  =  e'|et  =  e)=  f(9')  Prob(et+1=  e|et=  9'), 
or  equivalently  that 

f(9)/f(9')  =  Prob(9t+1  =9|  9fc  =  9')/  Prob(9t+1  =  9' |  9t=  9). 

Since  the  probability  of  a  jump  of  more  than  w  is  zero,  it 

suffices  to  check  this  conditions  between  adjacent  states,  so 

*  * 

take  6    =    B   +kw  and  6'    =    6   +(k+l)w  for  some  integer  k  between 

-M/w  and  (M-l) /w.   For  such  states,  we  have 

f(9)/f(9' )  =  2"2M|"(2M!)/(M+k!)  (M-k)  ll      =  (M+k+1)  /  (M-k)  , 
2-2M| (2M! ) / (M+k+1) ! (M-k-1) ! 1 

Prob(9t+1  =9 1  et  =  e')/   Prob(9t+1  =  9' |  9fc=  9)  = 

|(o-+(k+l)w)/2cr|/(o-kw)/2o-  =  (M+k+1)  w/  (M-k)  w, 
so  detailed  balance  holds.  ■ 


w/o-  (10)  (11) 

.5  .21  .25 

.1  .048  .05 

.05  .024  .025 

.001  .00496  .005 

TABLE  1: 

STEADY  STATE  VARIANCE  FOR  UNIFORM  NOISE 


and 
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As  one  would  expect,  the  variance  of  the  steady  state  is 
decreasing  in  w,  because  small  w  corresponds  to  small  steps  in 
each  period.  Note  that  the  social  optimum  is  the  constant  ©  = 
0,  and  that  the  expected  welfare  loss  (compared  to  ©  =  0)  when 
the  cut-off  is  ©.  is 


0 
Hence,   in  the  long  run  the  average  per-period  welfare  loss 


r  t        "2 
j  u  0  d©  =  ©J/2, 


(using  the  invariant  distribution  computed  in  proposition  7)  is 
1/2  E(02)  =  1/2  (E(©))2  +  1/2  var(0)  =  [(2/3+l)2/8  +cr/2]w,  so 
that  steady  state  welfare  is  decreasing  in  w.    For  small  w, 
despite  the  lack  of  either  memory  or  popularity  weighting,  the 

spatial  nature  of  the  process  allows  the  long-run  outcome  to  be 

.  .     19 
approximately  efficient. 

While  small  w's  are  thus  desirable  from  the  viewpoint  of 

the  time-average  payoff,  they  entail  a  significant  short-run 

welfare  loss  when  the  initial  state  is  far  from  the  optimum, 

because  in  this  case  the  system  will  take  a  long  time  to 

approach  the  neighborhood  of  the  optimum.   This  is  true  for  two 

reasons:  First,  ©.   is  limited  to  move  at  most  w  per  period. 

Second,  in  the  presence  of  noise  a  typical  path  is  likely  to 

* 
take  far  more  than  ©0/w  periods  to  reach  a  neighborhood  of  ©  , 

because  many  steps  will  be  in  the  wrong  direction. 

For  a  fixed  initial  condition  and  social  discount  factor, 

the  socially  optimal  window  width  will  trade  off  the  speed  of 

convergence  and  the  steady  state  variance,  with  larger  w's  being 

optimal  the  farther  the  initial  condition  is  from  0.   If  the 

social  planner  does  not  know  the  initial  condition  and/or  the 

location  of  the  social  optimum,  the  size  of  the  optimal  w  will 
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depend  on  the  planner's  prior  beliefs.   This  tradeoff  between 

speed  of  adjustment  and  the  variance  of  the  steady  state  seems  a 

20 
natural  feature  of  the  sorts  of  model  we  consider 

At  this  point  we  would  like  to  make  a  few  observations 

about  how  the  conclusions  might  change  if  the  players  did  keep 

records  of  their  past  observations.   Since  players  at  locations 

* 
within  <r    of  9      will  play  both  technologies  infinitely  often, 

they  could  eventually  learn  which  technology  is  better  for 

themselves  by  keeping  such  records.   However,  a  few  calculations 

suggest  that  this  learning  process  will  be  fairly  slow  if  the 

random  shock  to  the  payoffs  has  a  sizable  common  component  and  w 

is  small. 

To  see  this,  suppose  that  the  payoffs  to  each  technology 

are  subject  to  a  common  shock  tj.  as  well  as  the  idiosyncratic 

shocks  we  assumed  before,  so  that  system  (5)  is  replaced  by 


(5') 


<u^(e)=e  +  0e  +eit+  V 

[u£(8)=00  +C2t  +  7Jt. 


If  the  variance  of  77  is  relatively  large,  then 
observations  of  only  one  technology  at  date  t  are  not  very 
informative,  and  only  observations  of  both  technologies  in  the 
same  period  will  be  of  much  help.  Players  at  locations  more  than 

three  standard  deviations  from  9       -that  is,  outside  of  the 

*         1/2  • 

interval  9   ±     3 (aw/2)   -  rarely  see  both  technologies  played, 

and  hence  would  need  a  very  long  memory  to  learn.  Players  at 

* 

locations  9    closer  to  9      do  see  both  technologies  played  more 

often,  but  for  these  players  the  systematic  payoff  difference 
between  the  technologies  is  smaller,  and  hence  it  may  require 
many  observations  to  be  fairly  confident  one  is  better.   Our 

32 


informal  approximations,  reported  in  appendix  E,  suggest  that 
this  is  indeed  the  case,  and  in  particular  that  the  number  of 

periods  required  to  be  fairly  confident  which  technology  is 

2    2 
better  is  of  the  order  of  a   /   w  ,  so  that  when  w  is  small  a  very 

long  history  would  be  required  for  players  to  do  much  better 

than  with  our  simple  rule.  Of  course,  players  could  use  history 

even  when  the  advantage  to  doing  so  is  slight  or  slow  to 

develop,  but  in  these  cases  it  seems  less  obvious  that  players 

would  be  led  to  abandon  simple  rules. 

6.    Examples  of  non-linear  Technologies 

Before  considering  the  implications  of  popularity  weighting 

in  a  heterogeneous  population,  we  would  like  to  discuss  some 

examples  of  what  can  happen  without  popularity  weighting  when . 

the  payoffs  as  a  function  of  location  do  not  take  the  linear 

form  presumed  in  equation  (5)  .  Suppose  for  example  that  the 

"old"  technology  f  has  returns  that  are  identically  zero,  while 

g(0)  =  cos (9),  so  that  regions  where  g  is  optimal  alternate  with 

(See  ^.(^uv^c.^-^ 
regions  where  f  is.^If  there  is  no  noise  in  the  system,  and  the 

window  width  is  relatively  small,  then  even  if  all  players  in 

locations  ©  e  [-tt/2,  tt/2]  adopt  the  new  technology  g,  the  new 

technology  will  not  spread  to  the  other  regions  where  it  is 

optimal.    In  this  example  there  are  substantial  social  gains 

from  having  the  new  technology  "tested"  at  a  number   of  diverse 

locations.  (It  is  for  this  reason  that  we  would  argue  that  the 

gentleman  farmers  may  have  played  an  important  role  in  the 

spread  of  the  new  husbandry,  even  though  they  were  not  among  the 

first  to  adopt  it.) 

It  may  also  be  interesting  to  note  that  when  the  local 
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m  ^  o 


g(6)=cos  e 


Figure    1 


process  may  fail  to  spread  as  widely  as  it  should,  random  shocks 
to  payoffs  can  increase  social  welfare,  that  is,  welfare  can 
increase  as  the  variance  of  the  noise  term  z.  increases  from 
zero.  Suppose  that  the  technologies  are  f(0)  =  0  and  g(0)  = 
cos(0),  and  that  the  initial  state  has  all  players  to  the  right 

of  0  using  g  and  players  to  the  left  using  f.  Without  noise, 

* 
the  cutoff  will  move  to  e   =  37T/2  and  stay  there.   (See  figure 

1.)  When  the  support  of  z.  is  sufficiently  large,  there  will 

eventually  be  enough  consecutive  draws  of  very  negative  z.  that 

the  cutoff  reaches  tt/2.    From  this  point,  the  system  may  no 

longer  have  a  single  cutoff,  as  players  to  the  left  of  tt/2  will 

tend  to  switch  to  g,  while  those  to  the  right  switch  back  to  f. 

Essentially,  the  noise  leads  the  players  in  region  II  to  use  the 

new  technology  long  enough  that  it  can  spread  from  region  ,1  to 

region  III. 

The  next  example  shows  that  in  certain  extreme  cases  the 

specification  error  involved  in  ignoring  how  payoffs  vary  with 

"location"  can  allow  a  technology  that  is  everywhere  inferior  to 

completely  drive  out  a  better  one.   This  is  the  case  depicted  in 

figure  2  below,  in  which  f(0)  =  0  and  g(0)  =  0-e .    If  the 

A.  A  A. 

current  cut-off  is  at  0,  then  the  player  at  0  €  [0  -w,  e+w] 

-a  -f 

computes   u^(0)   =   0-e   +(0-(0-w) )  /2,   and   u  (0)   =   0   -w 

+  (0-(0-w) ) /2.    Since  ug(0)  -  uf(0)  =  w  -  e,  if  w  >  e   all 

players  who  observe  both  technologies  choose  technology  g. 

Hence  B.     .=   0t  -  w,  and  eventually  g  will  take  over  the  entire 

population. 

We  should  point  out  that  these  technologies  are  quite 

special:  an  inferior  technology  can  only  drive  out  a  better  one 

if  the  difference  in  payoffs  |f-g|  is  small  compared  to  the 
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Figure  2 


errors  caused  by  estimating  the  payoffs  by  their  average  values 
in  the  window.  These  errors  are  of  the  magnitude  of  w  df/de 
and  w  dg/de,  which  bound  the  difference  between  the  payoffs  at 
(0-w)  and  e+w.  Thus,  if  w  is  small,  the  difference  in  payoffs 
|f-g|  must  be  small  as  well  in  order  for  the  inferior  technology 
to  dominate,  and  hence  even  though  the  wrong  technology  is 
adopted  everywhere,  the  payoff  loss  at  each  location  is  not 
substantial.  (In  the  example  above,  the  payoff  loss  at  each 
location  is  c,  and  c  must  be  less  than  w  in  order  for  g  to 
dominate. ) 

For  small  window  widths,  a  more  substantial  payoff  loss 
arises  when  the  new  technology  is  not  adopted  in  a  region  where 
it  is  a  substantial  improvement.  This  was  the  case  in  the 
example  where  g  =  cos(0)  and  f  =  0,  so  that  the  regions  where  g 
should  be  adopted  are  disconnected.  We  can  also  modify  the 
example  of  figure  2  so  that  g  is  better  than  f  at  every  location 
(and  so  in  particular  is  better  on  a  connected  set)  and  yet  a 
substantial  payoff  loss  results  from  g  failing  to  spread.  In 
figure  3 ,  the  payoffs  to  f  and  g  are  such  that  g  is  much  better 
than  f  in  the  neighborhood  of  9  =  0,  but  is  only  slightly  better 
than  f  for  extreme  6  values.  Hence,  if  technology  g  is  first 
introduced  at  these  extreme  values,  it  will  be  driven  out  of  the 
population  before  it  can  be  tried  in  the  center  region. 

7.  Heterogeneous  populations  and  popularity  weighting 

Our  analysis  of  social  learning  in  homogeneous  populations 
showed  that  popularity  weighting  could  improve  the  aggregate 
performance  of  the  learning  process,  and  that  the  optimal  level 
of   popularity   weighting   is   consistent   with   individual 
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Figure   3 


incentives.  We  will  now  investigate  the  implications  of 
popularity  weighting  in  our  model  of  a  heterogeneous  population 
with  linear  technologies. 

To  model  popularity  weighting,  let  x.(8)  be  the  fraction  of 
players  in  the  interval  [6-w,  e+w]  who  use  technology  g.  In  the 
spirit  of  the  popularity  weighting  rule  (3),  we  now  modify  the 
the  decision  rule  (6)  used  in  sections  5  and  6  and  suppose  that 
players  use  the  decision  rule 

(12)      "Play  g  at  period  t+1  iff  u^(0)  -uj(8)  *  m(l-2x. (6) ) , " 

where,  as  before,  the  parameter  m  indexes  the  importance  of 
popularity  in  the  players'  decisions. 

Since  the  analysis  of  this  system  is  quite  close  to  that  of 
the  system  without  popularity  weighting,  we  will  give  the 
results  without  proof.  As  in  section  5,  if  the  state  in  period 
t  corresponds  to  a  cut-off  rule,  so  will  the  state  in  period 
t+1.  In  addition,  without  noise  terms  the  system  has  the  same, 
unique,  steady-state  cut-off  9  =  - (2/3+1) w/2.  However,  the 
introduction  of  popularity  weighting  does  change  the  dynamics  in 
two  ways.  First,  in  the  absence  of  noise  terms,  the  system 
converges  to  the  steady-state  cut-off  from  any  initial  cut-off; 
the  oscillations  described  in  proposition  4  do  not  arise. 
Second,  (and  relatedly)  movements  of  less  than  one  window  width 
become  more  common,  as  players  are  more  hesitant  to  a  less 
popular  technology. 

The  following  proposition  gives  a  more  precise  description 
of  the  dynamics. 

Proposition  8 :  From  an  initial  cut-off  9    ,  the  system  described 
by  decision  rule  (12)  and  payoffs  (5)  evolves  according  to 
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(13)  : 

,6   +w  if  ©t<©t  -  (m+w/2) 

et+i=' et+  (2m-w) / (2m+w) (et"et)  if  8t6[et-(m+w/2),  e*+(m+w/2)] 
•  ©t~w  if  e.>  G.+  (m+w/2)  . 

Proof:  Omitted.  The  calculations  involved  are  straightforward, 
and  quite  similar  to  those  of  proposition  4.  Note  that  the 
dynamics  above  reduce  to  those  of  proposition  4  when  m  =  0,  as 
they  should  do. 

To  see  that,  in  the  absence  of  noise,  the  system  converges 

*  ... 

to  6      from  any  initial  cutoff,  note  that  the  cut-off  moves  a 

full  window  width  so  long  as  \S.-e    |  >  m  +  w/2.   Eventually  then 

A     *  A        it 

|©.-6  j|    ^   m   +   w/2,    and   from   then   on   0.  -6  = 

[  (2m-w)  /  (2m+w)  ]  (8.  -0  ),  so  that  the  system  converges  to  9  .at  a 
geometric  rate. 

Note  also  that  for  a  given  8.  ,  the  system  will  move  less 
than  a  full  window  width  whenever  the  realization  of  e.  is  in  an 
interval  of  2m  +  w.  This  show  that  popularity  weighting  makes 
the  system  more  "sluggish,"  and  suggests  that  it  will  reduce  the 
variance  of  the  long-run  distribution.  To  verify  this 
intuition,  and  determine  the  extent  to  which  popularity 
weighting  reduces  the  variance,  we  characterize  the  long-run 
distribution  in  one  special  case. 

Proposition  9 :  (a)    If   the   z.   are   i.i.d.   draws   from   a 
distribution  that  has  a  strictly  positive  density  on  a  compact 
support,  the  dynamic  process  defined  by  (5)  and  (12)  has  a 
unique  invariant  distribution, 
(b)   If  the  z.  are  i.i.d.  draws  from  the  uniform  distribution  on 
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[-a, a]    and  m  £  2a,   the  invariant  distribution  f  is  concentrated 
on  the  interval  [0  -a-   w/2,  0  +<t  +w/2]  and  satisfies 


and 


Ef(0)  =  e  , 


2 
var,  (0)  =  a   w/6m. 


Proof:    (a)  Omitted;  the  argument  is  very  close  to  that  for 
proposition  6. 

(b)   Appendix  D  shows  that  there  is  a  deterministic, 

finite  time  T  for  which  the  cut-off  0   is  in  the  interval  [0  -  cr 

*  . 

-w/2,  0  +  a   +w/2],  and  that  once  this  interval  is  reached,  ©T 

remains  in  the  interval  for  all  subsequent  periods  T+s. 

Given  a  T  satisfying  these  claims,  we  have  1 0T ,  t,  _em,  \    < 

*     *      *     * 
|0   -0  |  +  1 0T+   -S  |  <  (c  +  w/2)  +  cr  ,  which  is  less  than  m  + 

w/2  from  our  assumption  that  m  >  2a.    Hence  we  the  evolution  of 

0     from  T  on  is  determined  by  the  second  case  in  proposition 

8. 

Writing  c  =  (2m-w) / (2m+w) ,  and  applying  this  rule  repeatedly,  we 

find  that 


0T+s  -  <* 


-c)  )   cr  0* 

' /  T+s-x 


T=0 


Hence, 

E(0m+s|0T)  =  (l-c)^cTE(0*)  +  cs(l-c)0m  -»  E(0*), 

and 
var(0m+s|0T)  =  (1-c)2  ^  c2Tvar(0*)  = 

(l-c)cr2/3  (l+c)=   2wcr2/12m  =  cr2w/6m.    ■ 
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Comparing  the  steady  state  distributions  for  m  >  2<r  with 
that  for  m  =  0,  we  see  that  popularity  weighting  reduces  the 
long-run  variance  by  a  factor  of  l/3m. 

The  welfare  consequences  of  increasing  m  for  fixed  w  are 
similar  to  those  of  decreasing  w  for  fixed  m:  in  both  cases,  the 
steady  state  distribution  becomes  more  efficient,  while  the 
speed  at  which  the  system  converges  decreases.  It  may  be 
interesting  to  note,  however,  that  in  this  simple  model  there  is 
one  way  to  change  the  parameters  to  speed  up  the  rate  of 
convergence  (when  the  initial  cutoff  is  far  from  the  optimum) 

without  altering  the  steady-state  variance,  namely  increasing 

21 
the  window  width  w  while  holding  the  ratio  of  w/m  fixed. 

8 .  Concluding  Remarks 

The  various  models  we  have  presented  suggest  that  even 
very  naive  learning  rules  can  lead  to  quite  efficient  long-run 
social  states,  at  least  if  the  environment  is  not  too  highly 
nonlinear.  Moreover,  popularity  weighting  can  contribute  to  this 
long-run  efficiency,  and  the  use  of  popularity-weighting  passes 
a  crude  first-cut  test  of  consistency  with  individual 
incentives.  Of  course,  there  are  many  other  plausible 
specifications  of  behavior  rules  for  social  learning,  so  it  is 
interesting  to  speculate  about  the  robustness  of  our 
conclusions. 

In  this  light,  we  would  like  to  report  simulation  results 
for  one  simple  modification  of  popularity  weighting  that  seems 
to  improve  the  short-run  performance  of.  the  system  without 
changing  its  long-run  behavior.  For  this  purpose,  we  return  to 
the  homogeneous-population  model  of  section  4 ,  and  now  suppose 
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that  players  give  weight  to  trends  in  the  relative  popularity  of 
the  two  technologies  as  well  as  to  the  popularity  itself. 

More  precisely,  suppose  that  players  now  choose  technology 
g  iff  the  realized  difference  in  payoffs  u^  -u.  exceeds  the 
expression  m(l-2x.) -c(x.-xt  )  ,  where  x.-x._  is  the  trend  in 
popularity.  Since  the  trend  variable  converges  to  zero  along  any 
path  where  the  system  converges  to  a  steady  state,  the  system 
still  converges  to  the  better  technology  with  probability  l  when 
when  m  =cr.  However,  if  the  initial  state  is  far  from  the 
optimum,  as  in  the  case  when  a  superior  technology  is  first 
introduced,  one  would  expect  that  responsiveness  to  trends  would 
help  to  increase  the  speed  with  which  the  new  technology  is 
adopted. 

To  test  this  intuition,  we  ran  two  simulations,  both,  with 
the  noise  term  c  uniformly  distributed  on  [-cr,  cr]  and  popularity 
weighting  m  =  cr.  In  the  first,  the  fraction  a  who  adjust  each 
period  was  .5,  and  the  mean  payoff  difference  9  was  .lcr;  in  the 
second,  a  =  .1  and  B  =  .02cr.  In  both  cases,  we  counted  the 
number  of  periods  required  for  the  system  to  move  from  initial 
state  xn  =  .05cr  to  x  =  .99a.  The  results,  reported  in  Table  2, 
show  that  at  least  one  obvious  modification  of  our  assumptions 
reinforces  the  impression  that  naive  rules  can  perform  fairly 
well. 


a=.  5, 

e= 

.1 

a=.l,  8=.02 

c  =  0 

39 

940 

c  =  5 

26 

710 

c  =  10 

28 

470 

Table 

2 :  Trend 

Weiaht 

ina 

and  the  Speed 

22 


40 


There  are  a  number  of  other  extensions  that  we  have  not 
considered  but  seem  important.  Players  might  use  rules  of  thumb 
which  make  some  use  of  historical  data.  Also,  players  might  be 
arranged  in  more  complex  networks  than  the  simple  linear 
structure  we  have  considered.  Finally,  our  results  suppose 
either  that  rules  of  thumb  are  exogenous,  or,  in  Proposition  3, 
are  equilibrium  choices  of  a  static  game.  It  would  be 
interesting  to  complement  these  results  with  an  analysis  of  a 
dynamic  process  by  which  players  adjust  their  rules  of  thumb 
along  with  their  choice  of  technology. 

Finally,  we  should  point  out  that  popularity  weighting  is 
not  always  as  beneficial  as  our  results  might  suggest.  Consider 
the  problem  of  children  is  a  poor  neighborhood  choosing  whether 
to  pursue  higher  education.  If  students  who  have  done  so  in,  the 
past  tend  to  move  out  of  the  neighborhood,  and  past  residents 
are  underrepresented  in  the  observation  windows,  then  the  choice 
of  higher  education  will  appear  less  popular  than  it  really  is, 

and  decisions  based  on  popularity  may  be  biased  against  this 

,   .    23 
choice. 


41 


Appendix    A: 

Optimal    Popularity   Weighting   with    Other  Distributions 

To  better  understand  the  forces  generating  proposition  2a- 
that  a  single  choice  of  popularity  weight  yields  the  optimal 
long-run  distribution  uniformly  over  all  values  of  e-  we  show 
that  analogous  results  obtain  when  the  per-period  noise  term  e. 
has  distribution  F  with  unbounded  support. 

Suppose  first  that  a  =  1,  so  that  the  entire  population 
adjusts  every  period,  and  hence  the  state  x.  takes  on  only  the 
values  0  and  1.   If  we  let  s.  denote  the  vector 

[Prob(x.)=  0,  Prob(xt)  =  1],     we  have  s.    =s.A,  where  the 
transition  matrix  is  A  = 

F(m-e)      l-F(m-e) 
F(-m-e)     l-F(-m-e) 

Since  this  matrix  is  strictly  positive,   the  system  is 

*  .    . 
ergodic;  the  unique  invariant  distribution  u      is  given  by 

M*(x=0)  =  F(-m-e) /[F(-m-0)+  l-F(m-e)]. 

If  F  is  the  standard  normal  distribution,  then  as  m 
increases,  the  ratio  F(-m-S)/  1-  F(m-0)  converges  to  0  if  0  >  0, 
and  converges  to  »  if  6  <  0.  Hence  for  large  m,  the  ergodic 
distribution  of  the  system  places  probability  near  1  on  the 
correct  choice.  Moreover,  the  same  is  true  for  any  distribution 
for  which  the  the  ratio  F(-m-0)/  1-  F(m-S)  converges  to  0  if  6  > 
0,  and  to  oo  if  e  <  0.  (This  is  what  is  meant  by  saying  that  the 
tails  of  the  distribution  are  "infinitely  revealing.") 

With  a  more  involved  argument,  we  have  shown  that  the  same 
conclusion  holds  for  any  a  e  (0,1)  when  players  use  the 
(discontinuous)  popularity  weighting  "if  x.  £  1/2,  choose  g  iff 
u?-u.  £  -m;  if  x.  <l/2,  choose  g  iff  u^-u.  £  -m."  The  details 
are.  available  on  request,  here  is  the  intuition  for  the  result. 
Note  first  that  when  m  =  »  the  system  is  deterministic  with 
stable  steady  states  at  0  and  1.  If  m  is  finite  but  very  large 
compared  to  a  and  to  the  distribution,  then  steps  the  "wrong 
way"  (i.e.  decreasing  steps  when  x.  >  1/2  )  are  rare 
"innovations",  and  when  the  distribution  is  symmetric,  transits 


42 


from  0  to  1/2  and  from  1  to  1/2  both  take  same  number  of 

innovations.   If  the  tails  of  the  distribution  are  infinitely 

revealing,   then  as  m  -»   a>   innovations   towards   the   better 

technology   become   infinitely   more   likely   than   innovations 

towards  the  inferior  one,  and  the  analysis  of  Freidlin  and 

Wentzell   [1984]   suggests   that   the   limit   of   the   ergodic 

distributions  will  be  oncentrated  on  the  better  technology.  To 

establish  this  formally,  we  partitionthe  interval  into  a  large 

number   of   (appropriately   chosen)   small   subintervals,   and 

approximate  the  original  system  by  two  finite-state  Markov 

processes,  whose  ergodic  distributions  will  serve  as  bounds  on 

the  ergodic  distribution  of  the  original  system.   We  then  use 

the  discrete-time,   finite-state  translation  of  Freidlin  and 

Wentzell's  results   (Kandori,   Mailath  and  Rob   [1992],   Young 

[1992])  to  confirm  the  intuition  above,  i.e.  the  limits  of  the 

ergodic    distributions    of    thef inite-state    process    are 

concentrated  on  the  subinterval  corresponding  to  the  better 

choice.    The  above  suggests  that  infinitely-revealing  tails  are 

sufficient  for  there  to  be  a  single  popularity  rule  that  is 

approximately  optimal  for  all  6.     Moreover,  this  rule  has  the 

nice  feature  that  it  need  not  be  tailored  to  the  exact  form  of 

the  distribution.    Even  when  the  tails  are  not  infinitely 

revealing,  however,  there  is  another  popularity  rule  that  seems 

to  perform  very  well,  namely  "choose  g  iff  u^  -  u.  s  F   (1-x.)." 

With  this  rule, 

E(xt+1|xt)  =  (l-a)xt  +  a   Prob[0  +  cfc  *  F-1^)] 

=  xt  +  a[F(e+F_1(xt))-xt], 
so  that  E(x.  |x.)  >  x.  if  and  only  if  9  >  0;  the  system  tends 
to  drift  towards  the  correct  choice.  Although  the  system  may 
converge  to  the  wrong  technology  with  positive  probability,  the 
simulation  results  reported  in  table  2  for  the  logistic  and 
Laplace  distributions  (which  both  have  non-revealing  tails) 
suggest  that  when  a  is  small  the  system  is  very  likely  to 
converge  to  the  right  choice.  Intuitively,  when  a  is  small,  the 
system  evolves  through  a  series  of  small  steps  that  allow  the 
drift  to  outweigh  the  random  forces.  We  conjecture  that  there 
may      be      a      general      result      these      lines. 
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Appendix  B  : Proof  of  Proposition  2c 

If  cr-m  >  \9\  ,  then  neither  0  nor  1  is  an  absorbing  state. 
Our  first  step  is  to  show  that  there  is  a  unique  invariant 
distribution.  To  do  so,  we  first  note  that  the  stochastic  system 
(4)  is  a  random  contraction  in  the  sense  of  Norman  [1972].  A 
random  contraction  is  a  stochastic  system  in  which  the 
realization  of  an  i.i.d.  auxiliary  variable  (call  it  u)  is  used 
to  determine  which  of  a  family  of  mappings  <p  e  V  is  used  to 
send  x.  to  x.  .,  and  each  ip  is  a  contraction  "on  average."  In 
our  context,  cj  corresponds  to  the  realized  difference  in 
payoffs,  and  there  are  only  two  maps  ip  :  ^(x.  )  =  (l-a)x. +ct,  and 
<p_  (x.  )  =  (l-a)x.  ,  both  of  which  are  contractions,  so  that  (4)  is 
indeed  a  random  contraction.  Norman's  results  then  imply  that 
the  Markov  operator  associated  with  system  (4)  is  quasi-compact. 
We  next  note  that  when  \9\  <cr-m,  the  system  (4)  satisfies  the 
uniqueness  criterion  2.11  of  Futia  [1982]:  for  any  neighborhood 
U  of  the  point  x  =  1/2,  and  any  point  x'  in  [0,1],  there  is  an  n 
such  that  the  probability  the  system  starting  at  x'  in  is  in  U 
exactly  n  periods  later  is  strictly  positive.  (  If  m  £  <r,  the 
uniqueness  condition  fails,  as  both  x  =  0  and  x  =  1  are 
absorbing. ) 

The  last  step  is  to  compute  the  mean  and  variance  of  the 
invariant  distribution  u.    Using  E  (x.)  =  E  (x.  .) ,  we  have 

Eu(x)    =  (l-a)E^(x)  +  ajp(x)du(x), 

where  p(x)  =  (cr-m+e)/2<T  +  (m/cr)x.  is  the  probability  that  8+e.  £ 
m(l-2x.  )  ,  which  is  the  probability  that  x.  .  =  (l-a)x.+a. 
Simple  algebra  then  shows  that  E  x  =  1/2  +  8/2  (cr-m). 

To  compute  the  variance,  we  first  write  the  identity 

[(l-p(x)) [(l-a)x]2  +p(x) [(l-a)x+a)]2Jdu(x)  = 
E  (x2)|(l-a)2  +2a(l-a)m/crl  +   E  (x)  J2a  (1-a)  (cr-m+0)  /2a  +a2m/<r|  + 

a  (a-m+S) /2a; 

2  2  -      2 

solving  for  E  (x  )  and  computing  var(x)  =  E  (x  )  -(E  (x))   gives 

the  desired  result. ■ 

APPENDIX  C:   PROOF  OF  PROPOSITION  6 

To  begin  we  rewrite  (10)  in  the  equivalent  form  (10') 


E  (x2)  = 
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(10') 


t+1 


min[0  +w,2(0  +z  )-0.)]0  +  z  s© 

*•  *     "     *     " 

max[0.-w,2(0  +z  )-6    )]8   +zt<0t 


To  show  that  the  system  (10)  is  a  "random  dynamical  system"  as 
described  by  Futia  [1982],  we  note  that  the  auxiliary  events 
are  the  z .  .  The  probability  distribution  Q  on  the  z's  does  not 
depend  on  the  current  state,  and  so  in  particular  is  continuous 
in  the  state,  and  the  map  p(0,z)  defined  by  8.  =    <p(0.  ,  z.  )  is 

easily  seen  to  be  continuous  in  9  for  fixed  z,  so  that  (10)  is 
indeed  a  random  dynamical  system. 

Next  we  check  that  it  is  a  random  contraction,  as  in 
Futia 's  definition  6.2.  Because  the  map  Q  is  constant  in  0,  the 
constant  M  in  part  (a)  of  the  definition  can  be  taken  to  equal 
0.  Next  we  must  show  that  for  all  z,  and  all  0*0', 
d(<p(0,z),  (p(0',z))  ^  d(0,0'),  and  that  for  all  0  and  0'  there  is 
a  positive  probability  of  z  such  that  d(<p(9,z),  <p(8',z))  < 
d(0,0')  . 

To  show  that  d(<p(9,z),    tp(9',z))    *   d(0,0'),  we  note  that  for 

all  0  and  0'  and  all  z,  either  (a)  both  0  and  0'  move  in  the 

same  direction  (e.g.  (<p  (0,  z)  -0)  (<p  (0'  ,  z)  -0' )  >  0)  or  (b)  <p(9,z)-8 

i  0  i  <p(6',z)-8'.  Case  a  has  three  subcases:  either  (1)  <p 

moves  both  locations  by  w,  so  that  d(<p(0,z),  <p(8',z))    =  d(0,0'); 

* 
or  (2)  the  location  closer  to  0  +z.  moves  less  than  w,  and  the 

state  farther  away  moves  w,   so  that  d(p(0,z),  <p(9',z))      < 

d(0,0'),  or  (3)  both  locations  move  by  less  than  w,  in  which 

* 
case  the  two  locations  are  reflected  about  the  point  0  +z.,  and 

d(p(0,z),  <p(0',z))  =  d(0,0'). 

In  case  (b)  ,  suppose  w.l.o.g.  that  0  <  0';  then  case  b 
*  * 

implies  that  0  s.    0  +z  ^  0',  and  so  d(0,  0')  =  d(0,  0  +z)     + 

*  .  ...... 

d(0  +z,  0').   Using  the  triangle  inequality,  this  implies  that 


(CI) 


d(<p(9,z)  ,  <p(9'  ,z))    -   d(0,0')  i 


d(<p(9,z),    0*+z)  +  d(0  +z,  ip(0',z))  -  d(0,  0   +z)  -  d(0  +z,  0' ) 

=  [d(<p(0,z)  ,  0*+z)-  d(0,  0*+z)]  +[d(0*+z,  <p(0',z))-  d(0  +z,0')], 

and  inspection  of  (10')  shows  that  each  of  the  terms  in  square 

brackets  is  non-positive.     Thus  d(<p(0,z),  <p(0',z))  ^  d(0,0') 
for  all  z,  0,  and  0'. 

To   show  that   for  all  0   and  0'     there   is  positive 
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probability  that  d(tp(8,z),     tp(8',z))     <    d(8 ,8' )  ,  let  8    <    8'  ,  and 

* 
suppose  first  that  8    -    8   >    -    a    +    w/2.   Then  for  sufficiently 

small  c    >0  there  is  a  positive  probability  that  z    lies  in  any 

* 
sufficiently  small  neighborhood  of  8   -   8      +   e  -w/2,  and  for  z's 

in  this  neighborhood,  moves  less  than  w  to  the  left,  while  8' 

moves  w,  so  that  d(</>(8,z),  <p(8',z))    <    d(8,8').   If  6  -  6  s  -  <r  + 

* 
w/2  but  8' -8       <     a     -w/2,  a  similar  argument  establishes  the 

existence  of  a  range  of  z's  such  that  both  8    and  8'    move  to  the 

right,  with  8'    moving  less  than  8.       Finally,  ife-e<-o-  + 

* 
w/2   and  8'  -8      £  cr  -w/2,  then  8' -8    >  w,  and  &{<p(6  ,z)  ,    ip(8',z))    < 

d (8,8')     for  z's  in  a  neighborhood  of  e+w/2.   Thus  (10')  is  a 

random  contraction. 

The  last  step  in  the  proof   is  to  verify  that   (10') 

satisfies  Futia's   uniqueness  condition  2.11,  which  requires 

that  there  be  a  point  8      such  that  for  any  neighborhood  U  of  8 

and  any  8,    there  is  an  n  such  that  when  the  system  begins  at  8, 

it  has  a  positive  probability  of  being  in  U  in  period  n.   It  is 

h  * 

easy  to  see  that  e.g.  8Z  =  8     satisfies  this  condition. ■ 

APPENDIX  D  Proof  of  Proposition  9 ,  part  b. 

To  complete  the  proof,  we  must  show  that  there  exists  a 
deterministic,  finite  time  T  such  that  (i)  |e  -8]  <  a  +w/2,  and 
(ii)  that  |0m,c  -8\  <  0"+  w/2  for  all  subsequent  dates  T+s. 
Define  d.  =  |  8 .  -8  |.  Note  that  since  (8.  -8.)  and  (8.-8.) 
have  the  same  sign,  and  \8.  -  8  \  ^  cr,  (8  -8.  )  and  (0+-  +  1~e+-) 
have  the  same  sign  whenever  d.  >  a  +   w/2.   Hence, 

(Dl)   dt+1  =  |dt  -  (|8t+1  -  8t|)|  whenever  dt>  a   +  w/2. 

As  a  first  step  towards  proving  (i)  ,  we  show  that  for  any 
initial  condition  there  exists  a  finite  T'  such  that  regardless 
of  the  sample  path,  either  dT,  <  a  +  w/2  or  dT/  s  \&T>+1~eT'\  • 
To  see  this,  note  that  (Dl)  implies  that  until  such  a  T'  is 
reached,      d.   ~dt+1  =   let+i-etl'   and  from  proposition  8, 

|8t+1-8t|  =  min{  w,  [2w/(2m+w)](8t  -8*)}  * 

min{w,  [2w/ (2m+w) ]w/2} . 
Thus,  until  the  conditions  defining  T'  are  satisfied,  the 
decrease  in  d.  is  bounded  below  by  a  positive  constant  which  is 
independent  of  the  sample  path. 
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If  d  ,  <  it  +  w/2,  setting  T  =  T"  completes  the  proof  of  (i) 
The  remaining  case  is  a+    w/2  <  d  ,  s  ieT'+l  ~0T'  I  *     In  th^s 
case,  (Dl)  implies  that  dT,    =  |©T-+1  -  6    , |  -dt,  which  is  less 
than  w  -  (a+   w/2)  =  w/2  -   a   <   w/2  +cr.  Hence  we  can  set  T  =  T'+l 
to  complete  the  proof  of  (i) . 

To  prove  claim  (ii)  ,  note  that  when  |e_  -6  \  <  a+  w/2,  we 
have  |©T  -9T|  £  (a  +  w/2)  +  a,  which  is  less  than  m  +w/2  from 
the  assumption  that  m  >  2a.      From  proposition  8  we  then  have 

©T+1  =  0T  +  [ (2m-w)/(2m+w) ] (eT  -  e*)  ,  and  since  both  9*  and  ©T 
lie  in  the  interval  [G  -  a  -  w/2,  9  +a  +  w/2],  so  does  9  . 
The  claim  now  follows  from  induction  on  s.  ■ 


Appendix  E: 

This  appendix  gives  a  rough  computation  of  how  many  periods 
would  be  needed  on  average  for  a  player  using  history  to  be 
reasonably  sure  he  knew  which  technology  is  better.  If  we  let  a 
=  var(e  .  )  +  var(c_.)  denote  the  variance  of  u^  ~ut/  then  the 
player  at  9  will  need  about  (<r  /9)  observations  of  both 
technologies  to  be  fairly  confident  (about  85%-  i.e. 15%  chance 

of  a  false  rejection)  that  he  knew  which  technology  was  better. 

*  1/2     .       .  2 

For  9*6       +     {cm/ 2)       ,  this  requires  on  the  order  of  2a   /crw 

*  1/2        c* 

observations  of  both  technologies.  For  9      +    (crw/ 2)    <  9   *   9      + 

1/2  2 

3(crw/2)   ,  at  least  2a   /9crw  observations  of  both  technologies 
are  required. 

Our  next  step  is  to  approximate  the  frequency  with  which 
these  observations  arrive.  To  do  so,  we  approximate  the 
steady-state  distribution  by  a  normal  distribution,  and  then 
note  that  the  density  of  the  normal  is  less  than  -25/cr  at  points 
more  than  1  standard  deviation  away  from  the  mean,  and  that  the 
density  of  the  normal  at  the  mean  is  less  than  .5/cr.  Finally, 
we  recall  that  only  players  within  w  of  the  current  period's 

cutoff  see  both  technologies  being  used.   Thus  we  conclude  that 

* 
for  players  within  one  standard  deviation  of  9   the  required 

observations  arrive  on  average  every  [2w-.5/cr]    =  <r/w  periods; 

2  .        2    2 

since  these  players  need  about  2a     /crw  observations,  2cr   /w 

periods  are  required.    For  players  between  1  and  3  standard 

deviations  away,  the  required  observations  arrive  about  every 

2 
2cr/w  periods;  these  players  need  2cr   /9crw  observation,  so  that 

2     2.  C 

about  4<t   /9w  periods  are  required. 

47 


48 


REFERENCES 

Apodaca,  A.  [1952]  "Corn  and  Custom:  Introduction  of  Hybrid  Corn 
to  Spanish  American  Farmers  in  New  Mexico," 
in  E.H.  Spicer,      ed. ,  Human  Problems  in  Technological 
Change, "    Russel  Sage  Foundation,  New  York. 

Banerjee,  A.  [1991a]  "The  Economics  of  Rumours,"  forthcoming 
in  the  Review  of  Economic  Studies 

Banerjee,  A.   [1991b]  "A  Simple  Model  of  Herd  Behavior,"  , 
forthcoming  in  the  Quarterly  Journal  of  Economics. 

Bikchandari,  S.,  D.  Hirshleif fer,  and  I.  Welch  [1991]  "A  Theory 
of  Fads,  Fashion,  Custom,  and  Cultural  Change  as 
Informational   Cascades," 
forthcoming   in  the  Journal  of  Political   Economy. 

Bush,  R.R.  and  F.  Mosteller  [1955]  Stochastic  Models  for  Learning. 
Wiley,   New  York. 

Chambers,  J.D.  and  G.D.  Mingay  [1966]  The  Agricultural  Revolution 
1750-1880,  Schocker,  New  York. 

Conlisk,  J.   [1980]  "Costly  Optimizers  versus  Cheap 

Imitators,"   Journal  of    Economic       Behavior       and 
Organization.  1:275-293. 

Cross,   J.G.  [1983]   A  Theory  of  Adaptive  Economic   Behavior. 
Cambridge  University  Press 

Ellison,    G.    [1991]    "Learning,    Local    Interaction, 
and  Coordination,"  mimeo. 

Ernie,   R.E.P.   [1912]       English   Farming   Past   and   Present. 
Longmans  Green  and  Co.,  London. 

Futia,   C.   [1982]   "Invariant   Distributions   and   the   Limiting 

Behavior  of  Markovian  Economic  Models,  Econometrica ,  50,  377-408 

Kandori,   M.  ,   G.   Mailath,   and   R.   Rob   [1992]    "Learning, 
Mutation,  and  Long  Run  Eguilibrium  in  Games,"  mimeo. 

Kerridge,  E.  [1967]  The  Agricultural  Revolution.  George  Allan  & 
Unwin,  London. 

Manski,  C.  [1990]  "Dynamic  Choice  in  a  Social  Setting,"  University 
of  Wisconsin  SSRI  discussion  paper  9003. 

Mantoux,   P.   [1905]   The   Industrial   Revolution   in   the   18th 
Century .   Harper  and  Row,  N.Y.;  reprinted  in  1961. 

Mingay,  G.E.  [1977]  The  Agricultural  Revolution.  Adam  &  Charles 
Black,   London. 

Norman,  M.F.   [1968]   "Some  Convergence  Theorems  for  Stochastic 
Learning   Systems   with   Distance-Diminishing    Operators," 
J.   Mathematical   Psychology.  5,  61-101. 

49 


Norman,  M.F.  [1972]  Markov  Processes  and  Learning  Models ,  Academic 
Press . 

Rogers,  E.  with  F.  Shoemaker  [1971]  Communication  of  Innovations: 
A  Cross-Cultural  approach.  2nd  edition,  New  York,  the  Free 
Press. 

Rothschild,  M.  [1974]  "A  Two-Armed  Bandit  Theory  of  Market 
Pricing,"  J^   Econ.  Theory.  9,  185-202. 

Ryan,  B.  and  N.  Gross  [1943]  "The  Diffusion  of  Hybrid  Seed 
Corn  in  Two  Iowa  Communities,"  Rural  Sociology 

Schmalansee,  R.  [1975]  "Alternative  Models  of  Bandit  Selection," 
J.  Econ   Theory.  10,  333-342. 

Slicher  van  Bath,  H.S.  [1965]  The  Agrarian  History  of  Europe  A.D. 
xxxx-1850 .   Edward  Arnold,  London. 

Smith,  L.  [1990]  "Error  Persistence  and  Experimental  versus 
Observational  Learning,"  mimeo. 

Timmer,  C.P.  [1969]  "The  Turnip,  the  New  Husbandry,  and  the 
English  Agricultural  Revolution,"  Quarterly  Journal  of 
Economics,    83,  375-395. 


50 


REFERENCES 

See  Rogers  and  Shoemaker  [1971]  for  an  extensive  discussion  of 
empirical  research  on  adoption  processes,  especially  in 
development.  Mansfield  [1968]  and  Ryan  and  Gross  [1943]  are 
classic  studies  of  technology  adoption  in  basic  industries  and 
agriculture,  respectively. 


Cross  [1983]  develops  a  model  of  boundedly  rational, 
adaptive  choice  with  a  similar  information  structure. 

Smith's  paper  models  the  pricing  decisions  of 
monopolistically  competitive  firms,  for  which  the  assumption  of 
unobserved  payoffs  seems  more  plausible.  Banerjee  [1991a]  is  a 
model  of  investment  decisions;  Banerjee  [1991b]  is  less  explicit 
about  its  intended  interpretations.  Bikchandari  et  al  suggest 
that  their  model  is  appropriate  for  a  wide  range  of  decision 
problems,  including  that  of  technology  adoption. 

4 

Manski  [1990]  considers  heterogeneous  populations  from  an 

econometric  viewpoint.   His  paper  provides  a  sufficient  condition 
for  an  individual  to  be  able  to  obtain  a  consistent  estimate  of 
his  own  optimal  choice  by  observing  the  choices  and  outcomes  of 
others . 
5 

Note  that  when  the  players  are  heterogeneous,  a  central 

planner  would  need  to  know  the  relative  payoffs  of  the 
competing  technologies  for  every  player  in  order  to  implement 
the  optimum  by  fiat.  Centrally-based  agricultural  reformers  are 
often  hampered  by  their  lack  of  understanding  of  the  variation 
in  farmers'  tastes  and  production  costs.  For  example,  Apodaca 
[1952]  describes  how  a  planner  tried  to  induce  a  New  Mexico 
community  to  adopt  a  hybrid  corn.  The  innovation  was  adopted 
and  then  discontinued  despite  doubling  yields,  as  the  villagers 
decided  the  taste  and  consistency  of  the  corn  were  inappropriate 
for     making  tortillas. 

(In  the  second  model  the  environment  is  complicated  enough  that  a 
great  many  periods  would  be  required  to  obtain  good  estimates,  as 
we  discuss   in  section   5.) 

See  Mingay  [1977]  and  Slicher  von  Bath  [1965]  for  more 
thorough  descriptions  of  the  technology.  Timmer  [1965] 
discusses  the  extent  of  the  resulting  gains  in  productivity. 

8Kerridge  [1967]  pp.  28-34,  339-341. 

9 
Chambers  and  Mingay  [1966]  p.  55. 

10See  Timmer  [1969].  and  Kerridge  [1967].  Slicher  von  Bath  [1965] 
p.  243  gives  a  similar  figure  for  the  rate  of  diffusion  in 
France. 

See  Timmer  [1969]  for  an  excellent  summary  of  this  debate. 
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12  •  ... 

The  consideration  of  inertia  is  further  motivated  by  the 

empirical  evidence  that  there  is  typically  a  substantial  lag 

between  the  time  individuals  first  learn  of  the  existence  of  a 

technology  and  the  time  they  adopt  it.   Ryan  and  Gross  [1943] 

found  that  farmers  in  two  rural  communities  on  average  adopted 

hybrid   seed   corn   9   years   after   they   first   heard   of   the 

innovation.  Other  studies  cited  in  Rogers  and  Shoemaker  [1971], 

p.  129,  report  lags  of  2-4  years  for  the  adoption  of  weed  spray 

in  Iowa  and  fertilizer  in  Pakistan.   Note  that  the  spread  of 

literacy  and  modern  communication  media  will  speed  up  the  rate 

at  which  farmers  become  aware  of  a  new  technology's  existence, 

but  do  not  seem  to  have  eliminated  the  lag  between  becoming 

informed  and  deciding  to  adopt. 

If  we  interpret  x.  as  the  probability  that  a  singleindividual 

chooses  g,  as  opposed  to  the  population  fraction,  (2)  is  an 
example  of  the  linear  stochastic  learning  theory  (LSLT)  of  Bush 
and  Mosteller  [1952].  This  theory  describes  a  traditional 
one-player  bandit  problem,  in  which  only  1  arm  is  observed  in 
each  period;  it  is  also  assumed  that  the  only  possible  outcomes 
are  "reward"  and  "failure,"  with  the  probability  of  arm  i  being 
rewarded  being  tt.  Our  model  corresponds  to  the  special  case  in 
which  7T.  =  1-tt_  ,  and  the  two  arm's  outcomes  in  a  given  period 

are  perfectly  negatively  correlated.  (In  the  bandit  problem,  the 
joint  distribution  is  irrelevant.)  See  Schmalansee  [1975]  for  a 
brief  survey  of  these  models,  and  a  discussion  of  their 
applicability  to  market  pricing.  In  our  system  (2)  , 
Schmalansee's  constants  L,  L,  G,  and  G  all  equal  a,  while 
Schmalansee's  a     and  a     both  equal  1,  and  his  /S.  and  jS_  equal  0. 

Schmalansee  argues  that  in  some  cases  the  behavior  these  models 
prescribe,  with  both  actions  taken  infinitely  often,  is  more 
realistic  than  that  of  the  optimal  solution  to  the  discounted 
bandit  problem. 

14  .... 

The  empirical  literature  suggests  that  popularity  weighting  is 

a  factor,  but  reliable  estimates  of  m  are  hard  to  come  by. 

Rogers  and  Shoemaker  (op   cit.  ,    p.  142)  say  that  "many  students 

of  peasant  life  feel"  that  innovations  must  be  20%  to  30%  better 

to  be  adopted;  they  also  cite  a  President's  Science  Advisory 

Committee  figure  of  50%  to  100%  .  From  our  reading,  it  is  not 

clear  whether  these  premia  reflect  popularity  weighting  or 

inertia. 
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15 

We  should  warn  the  reader  that  the  results  we  obtain  for  the 

uniform   case   will   seem   to   rely   on   the   fact   that   this 

distribution  has  compact  support:   an  observation  that  u^  -  u^  > 

a  implies  that  6  is  greater  than  zero.  However,  compact  support 
is  not  what  underlies  our  conclusions.  Appendix  A  shows  that  the 
non-linear  rule  "only  switch  if  the  observed  payoff  difference 
is  large  compared  to  the  popularity"  leads  to  a  long-run 
distribution  that  places  most  of  its  weight  on  the  better  choice 
whenever  the  distribution  of  errors  is  "infinitely  revealing  in 
the  tails."  The  appendix  also  reports  simulations  of  a  more 
complex  rule  that  seems  to  work  well  even  when  the  tails  are  not 
infinitely  revealing. 

An  alternative  explanation  is  to  use  the  fact  that  the  rule  m 
=  a  yields  the  optimal  long-run  decision,  and  that  since  a  is 
the  standard  deviation  of  the  per-period  payoff  differences, 
rescaling  the  utility  function  rescales  cr   in  the  same  way. 


17 

Unless  the  payoff  difference  is  so  extreme  that  G    -cr   >   m,  in 

which  case  the  rate  of  adoption  is  independent  of  8.  Note  that 

the  rate  is  also  an  increasing  function  of  8  when  m  =  0,  s 

provided  that  8  is  smaller  than  cr. 

18 

However,  the  correlation  is  easy  to  explain  as  the  result  of  an 

optimal  investment  policy  under  complete  information  if  adopting 

the  innovation  requires  investing  in  a  capital  good. 

19 

Although  our  leading  example  of  very  small  window  widths  is 

the  English  agricultural  revolution,  small  window  widths  should 

not  be  seen  as  requiring  illiterate  agents.   Anecdotal  evidence 

suggests  that  farmers  often  distrust  the  information  of  central 

authorities  and  experts,  and  prefer  to  see  how  innovations  work 

out  in  their  neighborhood.   Ryan  and  Gross  [194  3]  found  that  the 

experiences  of  neighbors  was  an  important  factor  in  the  adoption 

of  hybrid  seed  corn  by  2  0th  century  Iowa  farmers. 

20  .      . 

'  Although  we  have  not  checked  the  details,  it  seems  that  a 

combination  of  large  window  widths  with  a  rule  of 
proximity-weighted  averages  could  combine  quick  convergence  with 
a  small  long-run  variance. 

21  .  .... 

"  However,  as  w  increases  the  specification  bias  grows.  When  w  is 

large,  it  may  be  more  natural  to  suppose  that  players  weight  the 
experience  of  those  nearby  more  than  that  of  those  who  are 
farther  away  but  still  within  their  window. 


22  ... 

Based  on  estimated  standard  errors,  the  first  two  digits 

are  correct  at  the  .95  level. 

23 

We  thank  Roland  Benabou  for  this  observation. 

(See  Futia's  [198  2  survey  for  a  summary  of  Norman's  results, 
and  other  techniques  for  establishing  that  the  invariant 
distribution  is  unique.) 
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